Handling data#
The “dataBox”#
- class flx.dataBox#
Handles the processing/storage of data-points.
- __init__(M_in, M_out)#
Initialize an empty data-box.
- write2mem(config)#
Allocate memory for storing data-points.
- Parameters:
config (dict) –
The following keys are allowed in config:
N_reserve(type unsigned long): The total number of data-points the memory can hold.cols(string or list, default: “all”): … see parametercolsinflx.dataBox.write2file().
- Return type:
None
- extract_col_from_mem(col)#
Return a numpy-array that points to the memory of
col(i.e., new memory is not allocated).- Parameters:
col (
dataBox_colID) – An identifier for the data-column to extract.- Return type:
- free_mem()#
Free the allocated memory for storing data.
- Return type:
None
- write2file(config)#
Set the handle to write data-points to a file / output stream.
- Parameters:
config (dict) –
The following keys are allowed in config:
fname(type string): The name of the file to open for output.append(bool, default: True):True: Append output to an existing file.False: Overwrite an existing file.binary(bool, default: True):True: Output data in binary format.False: Create a human-readable text file.cols(string or list, default: “all”):If a string is provided, the following keywords are accepted:
all: all data columns (i.e., model output and model input) are sent to the output stream. First, the model output and thereafter the model input is written.all_in: only the model input is written to the output stream.all_out: only the model output is written to the output stream.
If a list is provided, the list must be composed of entries of type
dataBox_colID.
- Return type:
None
- read_from_file(config)#
Import data-points from a file.
The total number of values stored in the file must be a multiple of
M_in+M_out.- Parameters:
config (dict) –
The following keys are allowed in config:
fname(type string): The name of the file to read from.binary(bool, default: True):True: Input data in binary format.False: Input from a human-readable text file.N_max(int, default: 0): If larger than zero, it specifies the upper limit of numbers read from the file.
- Return type:
None
- close_file()#
Closes an open file stream. No more samples will be written to the file.
- Return type:
None
- register_post_processor(config)#
Registers a new
post-processorand returns it.- Parameters:
config (
dataBox_postProc) – The configuration of the post-processor.- Return type:
- type dataBox_colID#
- Syntax:
COL- Description:
The configuration used to identify a data-column in a
flx.dataBox.
The following types are accepted for COL:
int: A integer that specifies the ID of a data-column. The numbering of column IDs for the model output starts with zero. The numbering of the column IDs for the model input starts with the total number of output columns. Value must be positive or zero.dict: A Python dictionary that expects the following keys:set(type string): Specifies how theidis interpreted. The value must either befull(on full set of data-columns),in(on data-columns of the input) orout(on data-columns of the output).id(type int): The index of the data-column.
Post-processors#
Overview#
- type dataBox_postProc#
- Syntax:
CONFIG- Description:
The configuration used to initialize a post-processor for a
flx.dataBox, where CONFIG is of type dict.- The following keys are allowed independent of the
type of the post-processor: type(dataBox_postProc_type): The type of the post-processor (required).
Additionally, depending on the specified
typeof the random variable, other keys can be required for definition; see section Types of post-processors.
- type dataBox_postProc_type#
- Syntax:
TYPE- Description:
Specifies the type of a
post-processorfor aflx.dataBox.- The following values/types for
post-processorscan be used: counter» countermean_double» mean_doublemean_pdouble» mean_pdoublemean_qdouble» mean_qdoublevdouble» vdoublereliability» reliabilityfilter» filter
The state of a
post-processorcan be retrieved by means of the functionflx.dataBox.postProc.eval().
- class flx.dataBox.postProc#
A post-processor for a
flx.dataBox.- eval()#
- Syntax:
flx.dataBox.postProc.eval()- Description:
Returns the current state of the post-processor. The states of the different post-processors are documented in section Types of post-processors.
- Return type:
Types of post-processors#
Statistical analysis#
counter#
- property counter#
A
post-processorthat counts the number of samples added.- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:N(int): The total number of samples added.
mean_double#
- property mean_double#
A
post-processorthat tracks the mean of the data-column based on a floating-point variable. It is a fast post-processor. However, for large sums, accuracy can become an issue due to floating-point precission.- Parametrization:
Parameters of this post-processor can be specified as additional key-value pairs in an object of type
dataBox_postProc_type. The following parameters are accepted:col(dataBox_colID): An identifier for the data-column to track.
- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:mean(float): The sample mean of the tracked data-column.N(int): The total number of samples of the tracked data-column.
mean_pdouble#
- property mean_pdouble#
A
post-processorthat tracks the mean of the data-column based on a special floating-point variable that minimizes potential numerical summation errors (based on the Kahan summation algorithm).- Parametrization:
Parameters of this post-processor can be specified as additional key-value pairs in an object of type
dataBox_postProc_type. The following parameters are accepted:col(dataBox_colID): An identifier for the data-column to track.
- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:mean(float): The sample mean of the tracked data-column.N(int): The total number of samples of the tracked data-column.
mean_qdouble#
- property mean_qdouble#
A
post-processorthat tracks the mean of the data-column based on a special floating-point variable that minimizes potential numerical summation errors (based on performing the summation in separate bins).- Parametrization:
Parameters of this post-processor can be specified as additional key-value pairs in an object of type
dataBox_postProc_type. The following parameters are accepted:col(dataBox_colID): An identifier for the data-column to track.NpV(int): A number of points. Value must be larger than zero.ppb(bool):True, NpV is interpreted as the number of points per summation bin.False, NpV is interpreted as the total number of samples - and the number of bins is estimated from this number.
- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:mean(float): The sample mean of the tracked data-column.N(int): The total number of samples of the tracked data-column.
vdouble#
- property vdouble#
A
post-processorthat tracks the mean and the variance of the data-column. A special floating-point variable (based on the Kahan summation algorithm) is used to increase floating-point precision.- Parametrization:
Parameters of this post-processor can be specified as additional key-value pairs in an object of type
dataBox_postProc_type. The following parameters are accepted:col(dataBox_colID): An identifier for the data-column to track.
- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:mean(float): The sample mean of the tracked data-column.sd(float): The sample standard deviation of the tracked data-column.var(float): The sample variance of the tracked data-column.N(int): The total number of samples of the tracked data-column.rv_mean(flx.rv): A random variable quantifying the uncertainty about the mean value of the tracked data-column.
Reliability analysis#
reliability#
- property reliability#
A
post-processorthat interprets the values of the data-column as a limit-state function and tracks the reliability.- Parametrization:
Parameters of this post-processor can be specified as additional key-value pairs in an object of type
dataBox_postProc_type. The following parameters are accepted:col(dataBox_colID): An identifier for the data-column to track.
- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:N(int): The total number of samples of the tracked data-column.H(int): The number of samples of the tracked data-column with a limit-state function smaller or equal than zero.mean_freq(float): Frequentist estimate of the mean value.mean_bayes(float): Bayesian estimate of the mean value.rv_pf(flx.rv): A random variable quantifying the uncertainty about the probability of failure of the tracked data-column.
Filtering of samples#
filter#
- property filter#
A
post-processorthat stores samples (in memory) only if a condition is met.- Parametrization:
Parameters of this post-processor can be specified as additional key-value pairs in an object of type
dataBox_postProc_type. The following parameters are accepted:col(dataBox_colID): An identifier for the data-column to track.N_reserve(int, default:1000000): Maximum number of data-points that can be stored.cond(flxParaFun, default:None): A functional expression that accepts an array of the current data-point as input. The input array is of sizeM_out+M_in(compareflx.dataBox); the array is a combination of output vector, followed by the input vector of the associatedflx.dataBox.
If
condis not explicitly specified, the sample is accepted if the value associated with data-columncolis smaller or equal than zero.- States:
When the function
flx.dataBox.postProc.eval()is called on this post-processor, the following states are returned:N(int): The total number of samples of accepted by the post-processor at hand.data(numpy.ndarray[float]): A reference to the data accepted by the post-processor at hand.
Examples#
import fesslix as flx
flx.load_engine()
import fesslix.model_templates
import numpy as np
Random Number Generator: MT19937 - initialized with rand()=1985436705;
Random Number Generator: MT19937 - initialized with 1000 initial calls.
Write samples to a file and to memory#
## ==============================================
## Generate model
## ==============================================
my_model = fesslix.model_templates.generate_reliability_R_S_example()
## ==============================================
## Set up dataBox
## ==============================================
dBox_1 = flx.dataBox(my_model['sampler'].get_NOX(),len(my_model['model']))
## ----------------------------------------
## set up writing to a file
## ----------------------------------------
dBox_1.write2file( {
'fname': "mcs_samples.bin",
'append': False,
'binary': True,
'cols': 'all'
} )
## ----------------------------------------
## set up storing data in memory
## ----------------------------------------
dBox_1.write2mem( {
'N_reserve': int(1e6),
'cols': 'all'
} )
## ----------------------------------------
## register post-processors
## ----------------------------------------
pp_1a = dBox_1.register_post_processor({ 'type':'mean_double', 'col':{ 'set':'in', 'id':0} })
pp_1b = dBox_1.register_post_processor({ 'type':'mean_pdouble', 'col':{ 'set':'in', 'id':0} })
pp_1c = dBox_1.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':0} })
pp_1d = dBox_1.register_post_processor({ 'type':'vdouble', 'col':{ 'set':'in', 'id':0} })
## ==============================================
## Perform the Monte Carlo simulation
## ==============================================
my_model['sampler'].perform_MCS(10000,my_model['model'],dBox_1)
## ==============================================
## Close the file stream of dBox_1
## ==============================================
dBox_1.close_file()
## ==============================================
## Extract a data-column from dBox_1
## ==============================================
data_fvec = dBox_1.extract_col_from_mem( { 'set':'in', 'id':1} )
print( f"mean of S: {np.mean(data_fvec):.2f}" )
## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_1a:", pp_1a.eval() )
print( "pp_1b:", pp_1b.eval() )
print( "pp_1c:", pp_1c.eval() )
pp_1d_res = pp_1d.eval()
print( "pp_1d:", pp_1d_res, pp_1d_res['rv_mean'].mean() )
mean of S: 1.03
pp_1a: {'N': 10000, 'mean': 5.008957917584272}
pp_1b: {'N': 10000, 'mean': 5.008957917584253}
pp_1c: {'mean': 5.008957917584252, 'N': 10000}
pp_1d: {'mean': 5.008957917584253, 'var': 1.0364409474900838, 'sd': 1.018057438207729, 'N': 10000, 'rv_mean': <fesslix.core.rv object at 0x74d54010c1f0>} 5.008957917584253
Import samples from a binary file#
## ==============================================
## Set up dataBox and post-processors
## ==============================================
dBox_2 = flx.dataBox(2,1)
pp_2a = dBox_2.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':0} })
pp_2b = dBox_2.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':1} })
dBox_2.write2file( {
'fname': "mcs_samples.dat",
'append': False,
'binary': False,
'cols': 'all_in'
} )
## ==============================================
## import data from file
## ==============================================
dBox_2.read_from_file({ 'fname': "mcs_samples.bin", 'binary': True })
## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_2a:", pp_2a.eval() )
print( "pp_2b:", pp_2b.eval() )
pp_2a: {'mean': 5.00895791964531, 'N': 10000}
pp_2b: {'mean': 1.0293795454162755, 'N': 10000}
Import samples from a text-file#
## ==============================================
## Set up dataBox and post-processors
## ==============================================
dBox_3 = flx.dataBox(2,0)
pp_3a = dBox_3.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':0} })
pp_3b = dBox_3.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':1} })
## ==============================================
## import data from file
## ==============================================
dBox_3.read_from_file({ 'fname': "mcs_samples.dat", 'binary': False })
## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_3a:", pp_3a.eval() )
print( "pp_3b:", pp_3b.eval() )
pp_3a: {'mean': 5.0089579216999995, 'N': 10000}
pp_3b: {'mean': 1.0293795398482901, 'N': 10000}
Store samples based on a condition#
## ==============================================
## Set up dataBox and post-processors
## ==============================================
dBox_4 = flx.dataBox(my_model['sampler'].get_NOX(),len(my_model['model']))
## ----------------------------------------------
## use default for 'cond'
## ----------------------------------------------
## if 'cond' is not specified, the default behavior is to accept only values <=0.
pp_4a = dBox_4.register_post_processor({
'type':'filter',
'col':{ 'set':'out', 'id':0}
})
## ----------------------------------------------
## set a FlxFunction for 'cond'
## ----------------------------------------------
## we can also explicitly state this as ...
pp_4b = dBox_4.register_post_processor({
'type':'filter',
'col':{ 'set':'in', 'id':1},
'cond': "$1<=0."
})
## ----------------------------------------------
## use a Python function for 'cond'
## ----------------------------------------------
## as an alternative, a Python function can be used
## However, this is likely less efficient
def help_fun_4c(vec):
return vec[0]<=0.
pp_4c = dBox_4.register_post_processor({
'type':'filter',
'col':{ 'set':'in', 'id':1},
'cond': help_fun_4c
})
## ==============================================
## Perform the Monte Carlo simulation
## ==============================================
my_model['sampler'].perform_MCS(int(1e5),my_model['model'],dBox_4)
## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_4a:", pp_4a.eval() )
print( "pp_4b:", pp_4b.eval() )
print( "pp_4c:", pp_4c.eval() )
pp_4a: {'N': 3563, 'p': 0.03563, 'data': array([-1.4622718 , -0.21617003, -3.2868843 , ..., -2.7884986 ,
-0.67784244, -0.57807434], shape=(3563,), dtype=float32)}
pp_4b: {'N': 3563, 'p': 0.03563, 'data': array([4.212464 , 5.629917 , 7.4938693, ..., 7.261809 , 4.4445415,
4.2936106], shape=(3563,), dtype=float32)}
pp_4c: {'N': 3563, 'p': 0.03563, 'data': array([4.212464 , 5.629917 , 7.4938693, ..., 7.261809 , 4.4445415,
4.2936106], shape=(3563,), dtype=float32)}