tools - a collection of useful features#
Todo
Write this section.
import fesslix as flx
flx.load_engine()
import fesslix.tools
import fesslix.model_templates as flx_model_templates
import fesslix.plot as flx_plot
import matplotlib.pyplot as plt
import numpy as np
Random Number Generator: MT19937 - initialized with rand()=407372254;
Random Number Generator: MT19937 - initialized with 1000 initial calls.
Working with files#
- fesslix.tools.replace_in_template()#
- Syntax:
fesslix.tools.replace_in_template(fn_in, fn_out, dmap, var_indi_start="@{", var_indi_end="}")- Description:
Replaces expressions of the type
@{VARNAME}in filefn_inwith the values indmapand writes the processed file tofn_out.If the expression in
VARNAMEstarts with a!, the characters after the!are interpreted as expression of typervFullIDand the value of the associated random variable is inserted.
- Parameters:
fn_in (str) – File name of the template input file.
fn_out (str) – File name of the output file to generate.
dmap (dict) – A dictionary of all variables that can potentially appear in the file fn_in.
var_indi_start (str) – Unique expression (w.r.t. structure of file) for the beginning of the expression to replace.
var_indi_end (str) – Unique expression (w.r.t. structure of file) for the ending of the expression to replace.
- Return type:
None
Example:
## ================================================================
## Generate a dictionary
## ================================================================
dmap = {}
dmap['var1'] = 42.42
dmap['var2'] = "Hello world!"
## ================================================================
## Generate a set of random variables
## ================================================================
config_rv_a1 = { 'name':'rv1', 'type':'stdn' }
config_rv_a2 = { 'name':'rv2', 'type':'logn', 'mu':1., 'sd':2. }
rv_set_a = flx.rv_set( {'name':'rv_set_a'}, [ config_rv_a1, config_rv_a2 ] )
sampler_a = flx.sampler(['rv_set_a'])
sampler_a.sample()
## ================================================================
## Open template file and start replacing
## ================================================================
fesslix.tools.replace_in_template(fn_in="../data/sample_text.txt", fn_out="modified_text.txt", dmap=dmap)
The content of the template file is:
!cat ../data/sample_text.txt
@{var2} » This is an example text file @{var1}.
The content is to be modified by »fesslix.tools.replace_in_template«.
Value of random variable: @{!rv_set_a::rv1}
The content of the generated output file is:
!cat modified_text.txt
Hello world! » This is an example text file 42.42.
The content is to be modified by »fesslix.tools.replace_in_template«.
Value of random variable: -0.110861
Discretization#
- fesslix.tools.discretize_x()#
- Syntax:
fesslix.tools.discretize_x(x_low, x_up, x_disc_N=int(1e3), x_disc_shift=False, x_disc_on_log=False)- Description:
Discretizes a domain between x_low and x_up either linearly or in log scale.
- Parameters:
x_low (float) – Lower bound of the discretization interval.
x_up (float) – Upper bound of the discretization interval.
x_disc_N (int) – Number of discretization points to generate.
x_disc_shift (bool) – If True, the start and end point is shifted by half the mesh size so that x_low and x_up are not part of the returned set of discretization points.
x_disc_on_log (bool) – If True, the discretization is performed in log-space (and both x_low and x_up must be larger than zero).
- Returns:
The x_disc_N points of discretization.
- Return type:
Example:
print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=True, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0.01, x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=True) )
[0. 0.25 0.5 0.75 1. ]
[0.1 0.3 0.5 0.7 0.9]
[0.01 0.03162278 0.1 0.31622777 1. ]
- fesslix.tools.discretize_x_get_diff()#
- Syntax:
fesslix.tools.discretize_x_get_diff(x_low, x_up, x_disc_N=int(1e3), x_disc_on_log=False)- Description:
Discretizes a domain between x_low and x_up. In addition to the discretization points, information about the grid size is returned.
Internally, this function calls
fesslix.tools.discretize_x()withx_disc_shift=True. The returned discretization pointsxare then interpreted as the mid-points of a discretization grid and the size of each grid element is returned as a second vectordx.
- Parameters:
x_low (float) – Lower bound of the discretization interval.
x_up (float) – Upper bound of the discretization interval.
x_disc_N (int) – Number of discretization points to generate.
x_disc_on_log (bool) – If True, the discretization is performed in log-space (and both x_low and x_up must be larger than zero).
- Returns:
(x, dx)»x: mid points of grid elements;dxsize of grid elements- Return type:
Example:
x, dx = fesslix.tools.discretize_x_get_diff(x_low=0., x_up=1., x_disc_N=5, x_disc_on_log=False)
print(x)
print(dx)
[0.1 0.3 0.5 0.7 0.9]
[0.2 0.2 0.2 0.2 0.2]
- fesslix.tools.discretize_stdNormal_space()#
- Syntax:
fesslix.tools.discretize_stdNormal_space(q_low=1e-3, q_up=None, x_disc_N=int(1e3))- Description:
Returns equally spaced discretization points in the standard Normal domain. The parameters
q_lowandq_upare used to define the bounds in standard Normal space.
- Parameters:
- Returns:
The x_disc_N points of discretization.
- Return type:
Example:
print( fesslix.tools.discretize_stdNormal_space( x_disc_N=5 ) )
[-3.09023231 -1.54511615 0. 1.54511615 3.09023231]
- fesslix.tools.detect_bounds_x()#
- Syntax:
flx_tools.detect_bounds_x(rv, config_dict, q_low=1e-3, q_up=None, mode='ignore')- Description:
Evaluates and sets bounds x_low and x_up of random variable rv based on quantile values q_low and q_up. If existing values are to be overwritten is controled by mode.
- Parameters:
rv (
flx.rv) – random variableconfig_dict (
flx_plot_config) – configuration dictionary » This function ensures that the parametersx_lowandx_upare assigned.q_low (
flx_pr) – Quantile value for lower bound.q_up (
flx_pr) – Quantile value for upper bound.mode (str) –
Controls how existing values of x_low and x_up in config_dict are handled. The following keywords are allowed:
'ignore': ignore bounds of rv, if x_low and x_up are already set in config_dict'overwrite': use bounds of rv, even if x_low and x_up are already set in config_dict'minmax': use smallest value for bounds, if x_low and x_up are already set in config_dict
- Returns:
None
Example:
rv = flx.rv({'type':'stdn'})
era_dict = { }
fesslix.tools.detect_bounds_x(rv, era_dict,q_low=1e-6,q_up=0.99)
print(era_dict)
{'x_low': -4.753424308822899, 'x_up': 2.3263478740408408}
Data fitting#
- fesslix.tools.discretize_x_from_data()#
- Syntax:
fesslix.tools.discretize_x_from_data(data,config={}, data_is_sorted=False, lower_bound=None, upper_bound=None)- Description:
Discretize the parameter space into bins based on a data array.
- Parameters:
data (numpy.ndarray) – vector of data/samples
config (dict) – configuration dictionary
data_is_sorted (bool) – Set this to
Trueif the values in data are sorted from smallest to largest.lower_bound (float | None) – Value of an absolute lower bound. Set lower_bound to
Noneif a lower bound does not exist.upper_bound (float | None) – Value of an absolute upper bound. Set upper_bound to
Noneif an upper bound does not exist.
- Configuration directory:
The following keys are allowed in the configuration dictionary config:
mode(Word, default:adaptive): mode for the discretization of data into bins. The following modes for discretization are supported:adaptive: The bin size is selected adaptively based on a minimum number of data-points per bin and on a minimum bin size.For this mode, the following keys are additionally accepted in the configuration dictionary:
N_points_per_bin_min(int, default: 100): Minimum number of data-points per bin. Any bin must contain at least N_points_per_bin_min data-points. The specified integer value must be positive.dx_min(float): Minimum size of a bin in parameter space. Any bin must have at least a width of dx_min. If dx_min is not specified, the default value is assigned such that at most 8 bins fit into the interval spanned by the 75% and 25% quantile.
equidist_p: An equidistant grid in probability space is used to generate the bins.For this mode, the following keys are additionally accepted in the configuration dictionary:
N_bins(int, optional): Total number of bins.N_points_per_bin(int, default: 100): Number of data-points per bin. This parameter is only considered if N_bins is not specified.
fixed_p: The user provides the grid layout.For this mode, the following key must be specified in the configuration dictionary:
p_vec(numpy.ndarray): A numpy array with the probabilities of the discretization points of the grid (i.e., the edges of the bins). The first entry in p_vec must be zero and the last entry must equal one.
tail_upper(dict, default:None): Sets information about the location of the start of the upper tail. The following key-value pairs are accepted:p(float): Probability value that the distribution is smaller or equal than the starting value of the tail.x(float): Starting value of the tail.data(numpy.ndarray, optional): A vector of samples in the tail. If specified, these samples are used to fit the tail instead of the samples in the global data array.
tail_lower(dict, default:None): Sets information about the location of the start of the upper tail. Configuration is identical totail_upper.
- Returns:
A Python dictionary that contains the configuration of data into bins.
The returned dict has the following structure:
N_total(int): The total number of data-points in the array specified by the input parameter data.N_bins(int): The total number of bins generated.q_vec(numpy.ndarray): Vector of quantiles of the edges of the bins (of sizeN_bins+1).p_vec(numpy.ndarray): Vector of probabilities associated with the values in q_vec (of sizeN_bins+1).N_vec(numpy.ndarray): Number of data-points that fall into the individual bins (of sizeN_bins+1).tail_upper(dict): A configuration dictionary for modelling the upper tail. The structure of the dict corrsponds to the one returned byfesslix.tools.fit_tail_to_data().tail_lower(dict): A configuration dictionary for modelling the lower tail. The structure of the dict corrsponds to the one returned byfesslix.tools.fit_tail_to_data().type(Word): Set toquantiles, so that the returned configuration dictionary can directly be used to generate a quantile-based Distribution.interpol(Word): Mode of interpolation. For documentation, please see keyìnterpolin the configuration of quantile-based Distribution.use_tail_fit(bool): … see keyuse_tail_fitin the configuration of quantile-based Distribution.bin_rvbeta_params(bool): … see keybin_rvbeta_paramsin the configuration of quantile-based Distribution.bin_rvlinear_params(bool): … see keybin_rvlinear_paramsin the configuration of quantile-based Distribution.
The returned dict can directly (i.e., without modification) be used as configuration dictionary to generate a quantile-based Distribution.
- Return type:
- Examples:
Usage of this function is demonstrated in the examples of the quantile-based Distribution.
- fesslix.tools.fit_tail_to_data()#
- Syntax:
fesslix.tools.fit_tail_to_data(tail_data_transformed, bound=None)- Description:
Fits a probabilistic model to the data-points associated with the tails of a distribution.
This function is called internally by
fesslix.tools.discretize_x_from_data()for fitting the lower and upper tail to data.
- Parameters:
tail_data_transformed (numpy.ndarray) – The data-points associated with the tail. The values need to be transformed such that they are all positive. A value of zero is associated with the cutting-quantile of the tail. The larger the value, the further the point is in the tail.
bound (float) – A value specifying an absolute upper value (bound) for the tail. The value needs to be transformed as the data in tail_data_transformed.
- Supported probabilistic models for the tail:
Currently, the following probabilistic distribution models for the tail are supported:
genpareto» generalized Pareto distributionlogn» Log-normal distributionbeta» Beta distribution (only if bound is not None)
- Returns:
A Python dictionary that contains the configuration for the probabilistic model of the tail.
The returned dict has the following structure:
models(dict): A Python dictionary that contains all fitted models. All supported models are listed above. The key (of typeflx_rv_type) corresonds to the respective model. The value is a Python dictionary that contains with the following structure:type(flx_rv_type): The type of the tail model.pdf_0(float): The value of the PDF at zero.nll(float): Value of the negative log-likelihood of the data (w.r.t. the fitted probabilistic model).kstest_D(float): KS test statistickstest_p(float): p value from KS test
Additionally, for each probabilistic model, all parameters are specified such that the returned dict can be used as configuration dict to define a random variable (
flx.rv.__init__()).best_model(flx_rv_type): A reference to the model in models with the smallest value of the negative log likelihood (nll).use_model(flx_rv_type): A reference to the model in models that should actually be used as probabilistic model for the tail. By default, this value is set equal to the value of best_model. This value needs to be modified by the user if a model different from the one with the largest likelihood should be used. Ifuse_modelis set to"None"(as string), no probabilistic distribution model is associated with the tail.
- Return type:
- fesslix.tools.fit_pdf_based_on_qvec()#
- Syntax:
fesslix.tools.fit_pdf_based_on_qvec(data, config)- Description:
Fits a PDF based on linear interpolation to a data vector.
- Parameters:
data (numpy.ndarray) – vector of data/samples
config (dict) – Configuration dictionary, as returned by
fesslix.tools.discretize_x_from_data().
- Returns:
None
- Modification of the configuration dictionary:
The configuration dictionary config is extended by this function. Specifically, the key
pdf_vecis added and the keyìnterpolis changed topdf_linear(compare quantile-based Distribution).- Examples:
Usage of this function is demonstrated in the examples of the quantile-based Distribution.