Handling data

Handling data#

The “dataBox”#

class flx.dataBox#

Handles the processing/storage of data-points.

__init__(M_in, M_out)#

Initialize an empty data-box.

Parameters:

M_in (int) – The dimension of the input vector. Value must be positive or zero.
M_out (int) – The dimension of the output vector. Value must be positive or zero.

write2mem(config)#

Allocate memory for storing data-points.

Parameters:

config (dict) –

The following keys are allowed in config:

N_reserve (type unsigned long): The total number of data-points the memory can hold.
cols (string or list, default: “all”): … see parameter cols in flx.dataBox.write2file().

Return type:

None

extract_col_from_mem(col)#

Return a numpy-array that points to the memory of col (i.e., new memory is not allocated).

Parameters:: col (dataBox_colID) – An identifier for the data-column to extract.
Return type:: numpy.ndarray[float]

free_mem()#

Free the allocated memory for storing data.

Return type:: None

write2file(config)#

Set the handle to write data-points to a file / output stream.

Parameters:

config (dict) –

The following keys are allowed in config:

fname (type string): The name of the file to open for output.
append (bool, default: True): True: Append output to an existing file. False: Overwrite an existing file.
binary (bool, default: True): True: Output data in binary format. False: Create a human-readable text file.
cols (string or list, default: “all”):
If a string is provided, the following keywords are accepted:
all: all data columns (i.e., model output and model input) are sent to the output stream. First, the model output and thereafter the model input is written.

all_in: only the model input is written to the output stream.

all_out: only the model output is written to the output stream.
If a list is provided, the list must be composed of entries of type dataBox_colID.

Return type:

None

read_from_file(config)#

Import data-points from a file.

The total number of values stored in the file must be a multiple of M_in+M_out.

Parameters:

config (dict) –

The following keys are allowed in config:

fname (type string): The name of the file to read from.
binary (bool, default: True): True: Input data in binary format. False: Input from a human-readable text file.
N_max (int, default: 0): If larger than zero, it specifies the upper limit of numbers read from the file.

Return type:

None

close_file()#

Closes an open file stream. No more samples will be written to the file.

Return type:: None

register_post_processor(config)#

Registers a new post-processor and returns it.

Parameters:: config (dataBox_postProc) – The configuration of the post-processor.
Return type:: flx.dataBox.postProc

type dataBox_colID#

Syntax:: COL
Description:: The configuration used to identify a data-column in a flx.dataBox.

The following types are accepted for COL:

int: A integer that specifies the ID of a data-column. The numbering of column IDs for the model output starts with zero. The numbering of the column IDs for the model input starts with the total number of output columns. Value must be positive or zero.

dict: A Python dictionary that expects the following keys:

set (type string): Specifies how the id is interpreted. The value must either be full (on full set of data-columns), in (on data-columns of the input) or out (on data-columns of the output).

id (type int): The index of the data-column.

Post-processors#

Overview#

type dataBox_postProc#

Syntax:

CONFIG

Description:

The configuration used to initialize a post-processor for a flx.dataBox, where CONFIG is of type dict.

The following keys are allowed independent of the type of the post-processor:

type (dataBox_postProc_type): The type of the post-processor (required).

Additionally, depending on the specified type of the random variable, other keys can be required for definition; see section Types of post-processors.

type dataBox_postProc_type#

Syntax:

TYPE

Description:

Specifies the type of a post-processor for a flx.dataBox.

The following values/types for post-processors can be used:

counter » counter
mean_double » mean_double
mean_pdouble » mean_pdouble
mean_qdouble » mean_qdouble
vdouble » vdouble
reliability » reliability
filter » filter
akmcs » Importing data into an instance of AK-MCS

The state of a post-processor can be retrieved by means of the function flx.dataBox.postProc.eval().

class flx.dataBox.postProc#

A post-processor for a flx.dataBox.

eval()#

Syntax:: flx.dataBox.postProc.eval()
Description:: Returns the current state of the post-processor. The states of the different post-processors are documented in section Types of post-processors.

Return type:: dict

Types of post-processors#

Statistical analysis#

`counter`#

property counter#

A post-processor that counts the number of samples added.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

N (int): The total number of samples added.

`mean_double`#

property mean_double#

A post-processor that tracks the mean of the data-column based on a floating-point variable. It is a fast post-processor. However, for large sums, accuracy can become an issue due to floating-point precission.

Parametrization:

Parameters of this post-processor can be specified as additional key-value pairs in an object of type dataBox_postProc_type. The following parameters are accepted:

col (dataBox_colID): An identifier for the data-column to track.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

mean (float): The sample mean of the tracked data-column.
N (int): The total number of samples of the tracked data-column.

`mean_pdouble`#

property mean_pdouble#

A post-processor that tracks the mean of the data-column based on a special floating-point variable that minimizes potential numerical summation errors (based on the Kahan summation algorithm).

Parametrization:

Parameters of this post-processor can be specified as additional key-value pairs in an object of type dataBox_postProc_type. The following parameters are accepted:

col (dataBox_colID): An identifier for the data-column to track.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

mean (float): The sample mean of the tracked data-column.
N (int): The total number of samples of the tracked data-column.

`mean_qdouble`#

property mean_qdouble#

A post-processor that tracks the mean of the data-column based on a special floating-point variable that minimizes potential numerical summation errors (based on performing the summation in separate bins).

Parametrization:

Parameters of this post-processor can be specified as additional key-value pairs in an object of type dataBox_postProc_type. The following parameters are accepted:

col (dataBox_colID): An identifier for the data-column to track.

NpV (int): A number of points. Value must be larger than zero.

ppb (bool): True, NpV is interpreted as the number of points per summation bin. False, NpV is interpreted as the total number of samples - and the number of bins is estimated from this number.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

mean (float): The sample mean of the tracked data-column.
N (int): The total number of samples of the tracked data-column.

`vdouble`#

property vdouble#

A post-processor that tracks the mean and the variance of the data-column. A special floating-point variable (based on the Kahan summation algorithm) is used to increase floating-point precision.

Parametrization:

Parameters of this post-processor can be specified as additional key-value pairs in an object of type dataBox_postProc_type. The following parameters are accepted:

col (dataBox_colID): An identifier for the data-column to track.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

mean (float): The sample mean of the tracked data-column.
sd (float): The sample standard deviation of the tracked data-column.
var (float): The sample variance of the tracked data-column.
N (int): The total number of samples of the tracked data-column.
rv_mean (flx.rv): A random variable quantifying the uncertainty about the mean value of the tracked data-column.

Reliability analysis#

`reliability`#

property reliability#

A post-processor that interprets the values of the data-column as a limit-state function and tracks the reliability.

Parametrization:

Parameters of this post-processor can be specified as additional key-value pairs in an object of type dataBox_postProc_type. The following parameters are accepted:

col (dataBox_colID): An identifier for the data-column to track.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

N (int): The total number of samples of the tracked data-column.
H (int): The number of samples of the tracked data-column with a limit-state function smaller or equal than zero.
mean_freq (float): Frequentist estimate of the mean value.
mean_bayes (float): Bayesian estimate of the mean value.
rv_pf (flx.rv): A random variable quantifying the uncertainty about the probability of failure of the tracked data-column.

Filtering of samples#

`filter`#

property filter#

A post-processor that stores samples (in memory) only if a condition is met.

Parametrization:

Parameters of this post-processor can be specified as additional key-value pairs in an object of type dataBox_postProc_type. The following parameters are accepted:

col (dataBox_colID): An identifier for the data-column to track.

N_reserve (int, default:1000000): Maximum number of data-points that can be stored.

cond (flxParaFun, default:None): A functional expression that accepts an array of the current data-point as input. The input array is of size M_out+M_in (compare flx.dataBox); the array is a combination of output vector, followed by the input vector of the associated flx.dataBox.

If cond is not explicitly specified, the sample is accepted if the value associated with data-column col is smaller or equal than zero.

States:

When the function flx.dataBox.postProc.eval() is called on this post-processor, the following states are returned:

N (int): The total number of samples of accepted by the post-processor at hand.
data (numpy.ndarray[float]): A reference to the data accepted by the post-processor at hand.

Examples#

import fesslix as flx
flx.load_engine()
import fesslix.model_templates

import numpy as np

Random Number Generator: MT19937 - initialized with rand()=1985436705;
Random Number Generator: MT19937 - initialized with 1000 initial calls.

Write samples to a file and to memory#

## ==============================================
## Generate model
## ==============================================
my_model = fesslix.model_templates.generate_reliability_R_S_example()

## ==============================================
## Set up dataBox
## ==============================================
dBox_1 = flx.dataBox(my_model['sampler'].get_NOX(),len(my_model['model']))

## ----------------------------------------
## set up writing to a file
## ----------------------------------------
dBox_1.write2file( {
    'fname': "mcs_samples.bin",
    'append': False,
    'binary': True,
    'cols': 'all'
    } )

## ----------------------------------------
## set up storing data in memory
## ----------------------------------------
dBox_1.write2mem( {
    'N_reserve': int(1e6),
    'cols': 'all'
    } )

## ----------------------------------------
## register post-processors
## ----------------------------------------
pp_1a = dBox_1.register_post_processor({ 'type':'mean_double', 'col':{ 'set':'in', 'id':0} })
pp_1b = dBox_1.register_post_processor({ 'type':'mean_pdouble', 'col':{ 'set':'in', 'id':0} })
pp_1c = dBox_1.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':0} })
pp_1d = dBox_1.register_post_processor({ 'type':'vdouble', 'col':{ 'set':'in', 'id':0} })

## ==============================================
## Perform the Monte Carlo simulation
## ==============================================
my_model['sampler'].perform_MCS(10000,my_model['model'],dBox_1)

## ==============================================
## Close the file stream of dBox_1
## ==============================================
dBox_1.close_file()

## ==============================================
## Extract a data-column from dBox_1
## ==============================================
data_fvec = dBox_1.extract_col_from_mem( { 'set':'in', 'id':1} )
print( f"mean of S: {np.mean(data_fvec):.2f}" )

## ==============================================
## Evaluate post-processors
## ==============================================

print( "pp_1a:", pp_1a.eval() )
print( "pp_1b:", pp_1b.eval() )
print( "pp_1c:", pp_1c.eval() )
pp_1d_res = pp_1d.eval()
print( "pp_1d:", pp_1d_res, pp_1d_res['rv_mean'].mean() )

mean of S: 1.03
pp_1a: {'N': 10000, 'mean': 5.008957917584272}
pp_1b: {'N': 10000, 'mean': 5.008957917584253}
pp_1c: {'mean': 5.008957917584252, 'N': 10000}
pp_1d: {'mean': 5.008957917584253, 'var': 1.0364409474900838, 'sd': 1.018057438207729, 'N': 10000, 'rv_mean': <fesslix.core.rv object at 0x74d54010c1f0>} 5.008957917584253

Import samples from a binary file#

## ==============================================
## Set up dataBox and post-processors
## ==============================================
dBox_2 = flx.dataBox(2,1)

pp_2a = dBox_2.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':0} })
pp_2b = dBox_2.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':1} })

dBox_2.write2file( {
    'fname': "mcs_samples.dat",
    'append': False,
    'binary': False,
    'cols': 'all_in'
    } )

## ==============================================
## import data from file
## ==============================================
dBox_2.read_from_file({ 'fname': "mcs_samples.bin", 'binary': True })

## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_2a:", pp_2a.eval() )
print( "pp_2b:", pp_2b.eval() )

pp_2a: {'mean': 5.00895791964531, 'N': 10000}
pp_2b: {'mean': 1.0293795454162755, 'N': 10000}

Import samples from a text-file#

## ==============================================
## Set up dataBox and post-processors
## ==============================================
dBox_3 = flx.dataBox(2,0)

pp_3a = dBox_3.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':0} })
pp_3b = dBox_3.register_post_processor({ 'type':'mean_qdouble', 'col':{ 'set':'in', 'id':1} })

## ==============================================
## import data from file
## ==============================================
dBox_3.read_from_file({ 'fname': "mcs_samples.dat", 'binary': False })

## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_3a:", pp_3a.eval() )
print( "pp_3b:", pp_3b.eval() )

pp_3a: {'mean': 5.0089579216999995, 'N': 10000}
pp_3b: {'mean': 1.0293795398482901, 'N': 10000}

Store samples based on a condition#

## ==============================================
## Set up dataBox and post-processors
## ==============================================
dBox_4 = flx.dataBox(my_model['sampler'].get_NOX(),len(my_model['model']))

## ----------------------------------------------
## use default for 'cond'
## ----------------------------------------------
## if 'cond' is not specified, the default behavior is to accept only values <=0.
pp_4a = dBox_4.register_post_processor({ 
        'type':'filter', 
        'col':{ 'set':'out', 'id':0}
    })

## ----------------------------------------------
## set a FlxFunction for 'cond'
## ----------------------------------------------
## we can also explicitly state this as ...
pp_4b = dBox_4.register_post_processor({ 
        'type':'filter', 
        'col':{ 'set':'in', 'id':1},         
        'cond': "$1<=0."
    })

## ----------------------------------------------
## use a Python function for 'cond'
## ----------------------------------------------
## as an alternative, a Python function can be used
## However, this is likely less efficient
def help_fun_4c(vec):
    return vec[0]<=0.
pp_4c = dBox_4.register_post_processor({ 
        'type':'filter', 
        'col':{ 'set':'in', 'id':1},         
        'cond': help_fun_4c
    })

## ==============================================
## Perform the Monte Carlo simulation
## ==============================================
my_model['sampler'].perform_MCS(int(1e5),my_model['model'],dBox_4)

## ==============================================
## Evaluate post-processors
## ==============================================
print( "pp_4a:", pp_4a.eval() )
print( "pp_4b:", pp_4b.eval() )
print( "pp_4c:", pp_4c.eval() )

pp_4a: {'N': 3563, 'p': 0.03563, 'data': array([-1.4622718 , -0.21617003, -3.2868843 , ..., -2.7884986 ,
       -0.67784244, -0.57807434], shape=(3563,), dtype=float32)}
pp_4b: {'N': 3563, 'p': 0.03563, 'data': array([4.212464 , 5.629917 , 7.4938693, ..., 7.261809 , 4.4445415,
       4.2936106], shape=(3563,), dtype=float32)}
pp_4c: {'N': 3563, 'p': 0.03563, 'data': array([4.212464 , 5.629917 , 7.4938693, ..., 7.261809 , 4.4445415,
       4.2936106], shape=(3563,), dtype=float32)}

Handling data

Contents

Handling data#

The “dataBox”#

Post-processors#

Overview#

Types of post-processors#

Statistical analysis#

counter#

mean_double#

mean_pdouble#

mean_qdouble#

vdouble#

Reliability analysis#

reliability#

Filtering of samples#

filter#

Examples#

Write samples to a file and to memory#

Import samples from a binary file#

Import samples from a text-file#

Store samples based on a condition#

`counter`#

`mean_double`#

`mean_pdouble`#

`mean_qdouble`#

`vdouble`#

`reliability`#

`filter`#