tools - a collection of useful features#

Todo

Write this section.

import fesslix as flx
flx.load_engine()
import fesslix.tools
import fesslix.model_templates as flx_model_templates
import fesslix.plot as flx_plot

import matplotlib.pyplot as plt
import numpy as np
Random Number Generator: MT19937 - initialized with rand()=407372254;
Random Number Generator: MT19937 - initialized with 1000 initial calls.

Working with files#

fesslix.tools.replace_in_template()#
Syntax:

fesslix.tools.replace_in_template(fn_in, fn_out, dmap, var_indi_start="@{", var_indi_end="}")

Description:

Replaces expressions of the type @{VARNAME} in file fn_in with the values in dmap and writes the processed file to fn_out.

If the expression in VARNAME starts with a !, the characters after the ! are interpreted as expression of type rvFullID and the value of the associated random variable is inserted.

Parameters:
  • fn_in (str) – File name of the template input file.

  • fn_out (str) – File name of the output file to generate.

  • dmap (dict) – A dictionary of all variables that can potentially appear in the file fn_in.

  • var_indi_start (str) – Unique expression (w.r.t. structure of file) for the beginning of the expression to replace.

  • var_indi_end (str) – Unique expression (w.r.t. structure of file) for the ending of the expression to replace.

Return type:

None

Example:

## ================================================================
## Generate a dictionary
## ================================================================
dmap = {}
dmap['var1'] = 42.42
dmap['var2'] = "Hello world!"

## ================================================================
## Generate a set of random variables
## ================================================================
config_rv_a1 = { 'name':'rv1', 'type':'stdn' }
config_rv_a2 = { 'name':'rv2', 'type':'logn', 'mu':1., 'sd':2. }
rv_set_a = flx.rv_set( {'name':'rv_set_a'}, [ config_rv_a1, config_rv_a2 ] )
sampler_a = flx.sampler(['rv_set_a'])
sampler_a.sample()
    
## ================================================================
## Open template file and start replacing
## ================================================================
fesslix.tools.replace_in_template(fn_in="../data/sample_text.txt", fn_out="modified_text.txt", dmap=dmap)

The content of the template file is:

!cat ../data/sample_text.txt
@{var2} » This is an example text file @{var1}. 

The content is to be modified by »fesslix.tools.replace_in_template«.

Value of random variable: @{!rv_set_a::rv1}

The content of the generated output file is:

!cat modified_text.txt
Hello world! » This is an example text file 42.42. 

The content is to be modified by »fesslix.tools.replace_in_template«.

Value of random variable: -0.110861

Discretization#

fesslix.tools.discretize_x()#
Syntax:

fesslix.tools.discretize_x(x_low, x_up, x_disc_N=int(1e3), x_disc_shift=False, x_disc_on_log=False)

Description:

Discretizes a domain between x_low and x_up either linearly or in log scale.

Parameters:
  • x_low (float) – Lower bound of the discretization interval.

  • x_up (float) – Upper bound of the discretization interval.

  • x_disc_N (int) – Number of discretization points to generate.

  • x_disc_shift (bool) – If True, the start and end point is shifted by half the mesh size so that x_low and x_up are not part of the returned set of discretization points.

  • x_disc_on_log (bool) – If True, the discretization is performed in log-space (and both x_low and x_up must be larger than zero).

Returns:

The x_disc_N points of discretization.

Return type:

numpy.ndarray

Example:

print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=True, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0.01, x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=True) )
[0.   0.25 0.5  0.75 1.  ]
[0.1 0.3 0.5 0.7 0.9]
[0.01       0.03162278 0.1        0.31622777 1.        ]
fesslix.tools.discretize_x_get_diff()#
Syntax:

fesslix.tools.discretize_x_get_diff(x_low, x_up, x_disc_N=int(1e3), x_disc_on_log=False)

Description:

Discretizes a domain between x_low and x_up. In addition to the discretization points, information about the grid size is returned.

Internally, this function calls fesslix.tools.discretize_x() with x_disc_shift=True. The returned discretization points x are then interpreted as the mid-points of a discretization grid and the size of each grid element is returned as a second vector dx.

Parameters:
  • x_low (float) – Lower bound of the discretization interval.

  • x_up (float) – Upper bound of the discretization interval.

  • x_disc_N (int) – Number of discretization points to generate.

  • x_disc_on_log (bool) – If True, the discretization is performed in log-space (and both x_low and x_up must be larger than zero).

Returns:

(x, dx) » x: mid points of grid elements; dx size of grid elements

Return type:

(numpy.ndarray, numpy.ndarray)

Example:

x, dx = fesslix.tools.discretize_x_get_diff(x_low=0., x_up=1., x_disc_N=5, x_disc_on_log=False) 
print(x)
print(dx)
[0.1 0.3 0.5 0.7 0.9]
[0.2 0.2 0.2 0.2 0.2]
fesslix.tools.discretize_stdNormal_space()#
Syntax:

fesslix.tools.discretize_stdNormal_space(q_low=1e-3, q_up=None, x_disc_N=int(1e3))

Description:

Returns equally spaced discretization points in the standard Normal domain. The parameters q_low and q_up are used to define the bounds in standard Normal space.

Parameters:
  • q_low (float) – Quantile of a lower bound..

  • q_up (float) – Quantile of an upper bound. If None is specified, q_up=1.-q_low is assigned.

  • x_disc_N (int) – Number of discretization points to generate.

Returns:

The x_disc_N points of discretization.

Return type:

numpy.ndarray

Example:

print( fesslix.tools.discretize_stdNormal_space( x_disc_N=5 ) )
[-3.09023231 -1.54511615  0.          1.54511615  3.09023231]
fesslix.tools.detect_bounds_x()#
Syntax:

flx_tools.detect_bounds_x(rv, config_dict, q_low=1e-3, q_up=None, mode='ignore')

Description:

Evaluates and sets bounds x_low and x_up of random variable rv based on quantile values q_low and q_up. If existing values are to be overwritten is controled by mode.

Parameters:
  • rv (flx.rv) – random variable

  • config_dict (flx_plot_config) – configuration dictionary » This function ensures that the parameters x_low and x_up are assigned.

  • q_low (flx_pr) – Quantile value for lower bound.

  • q_up (flx_pr) – Quantile value for upper bound.

  • mode (str) –

    Controls how existing values of x_low and x_up in config_dict are handled. The following keywords are allowed:

    • 'ignore': ignore bounds of rv, if x_low and x_up are already set in config_dict

    • 'overwrite': use bounds of rv, even if x_low and x_up are already set in config_dict

    • 'minmax': use smallest value for bounds, if x_low and x_up are already set in config_dict

Returns:

None

Example:

rv = flx.rv({'type':'stdn'})
era_dict = { }
fesslix.tools.detect_bounds_x(rv, era_dict,q_low=1e-6,q_up=0.99)
print(era_dict)
{'x_low': -4.753424308822899, 'x_up': 2.3263478740408408}

Data fitting#

fesslix.tools.discretize_x_from_data()#
Syntax:

fesslix.tools.discretize_x_from_data(data,config={}, data_is_sorted=False, lower_bound=None, upper_bound=None)

Description:

Discretize the parameter space into bins based on a data array.

Parameters:
  • data (numpy.ndarray) – vector of data/samples

  • config (dict) – configuration dictionary

  • data_is_sorted (bool) – Set this to True if the values in data are sorted from smallest to largest.

  • lower_bound (float | None) – Value of an absolute lower bound. Set lower_bound to None if a lower bound does not exist.

  • upper_bound (float | None) – Value of an absolute upper bound. Set upper_bound to None if an upper bound does not exist.

Configuration directory:

The following keys are allowed in the configuration dictionary config:

  • mode (Word, default:adaptive): mode for the discretization of data into bins. The following modes for discretization are supported:

    • adaptive: The bin size is selected adaptively based on a minimum number of data-points per bin and on a minimum bin size.

      For this mode, the following keys are additionally accepted in the configuration dictionary:

      • N_points_per_bin_min (int, default: 100): Minimum number of data-points per bin. Any bin must contain at least N_points_per_bin_min data-points. The specified integer value must be positive.

      • dx_min (float): Minimum size of a bin in parameter space. Any bin must have at least a width of dx_min. If dx_min is not specified, the default value is assigned such that at most 8 bins fit into the interval spanned by the 75% and 25% quantile.

    • equidist_p: An equidistant grid in probability space is used to generate the bins.

      For this mode, the following keys are additionally accepted in the configuration dictionary:

      • N_bins (int, optional): Total number of bins.

      • N_points_per_bin (int, default: 100): Number of data-points per bin. This parameter is only considered if N_bins is not specified.

    • fixed_p: The user provides the grid layout.

      For this mode, the following key must be specified in the configuration dictionary:

      • p_vec (numpy.ndarray): A numpy array with the probabilities of the discretization points of the grid (i.e., the edges of the bins). The first entry in p_vec must be zero and the last entry must equal one.

  • tail_upper (dict, default:None): Sets information about the location of the start of the upper tail. The following key-value pairs are accepted:

    • p (float): Probability value that the distribution is smaller or equal than the starting value of the tail.

    • x (float): Starting value of the tail.

    • data (numpy.ndarray, optional): A vector of samples in the tail. If specified, these samples are used to fit the tail instead of the samples in the global data array.

  • tail_lower (dict, default:None): Sets information about the location of the start of the upper tail. Configuration is identical to tail_upper.

Returns:

A Python dictionary that contains the configuration of data into bins.

The returned dict has the following structure:

  • N_total (int): The total number of data-points in the array specified by the input parameter data.

  • N_bins (int): The total number of bins generated.

  • q_vec (numpy.ndarray): Vector of quantiles of the edges of the bins (of size N_bins+1).

  • p_vec (numpy.ndarray): Vector of probabilities associated with the values in q_vec (of size N_bins+1).

  • N_vec (numpy.ndarray): Number of data-points that fall into the individual bins (of size N_bins+1).

  • tail_upper (dict): A configuration dictionary for modelling the upper tail. The structure of the dict corrsponds to the one returned by fesslix.tools.fit_tail_to_data().

  • tail_lower (dict): A configuration dictionary for modelling the lower tail. The structure of the dict corrsponds to the one returned by fesslix.tools.fit_tail_to_data().

  • type (Word): Set to quantiles, so that the returned configuration dictionary can directly be used to generate a quantile-based Distribution.

  • interpol (Word): Mode of interpolation. For documentation, please see key ìnterpol in the configuration of quantile-based Distribution.

  • use_tail_fit (bool): … see key use_tail_fit in the configuration of quantile-based Distribution.

  • bin_rvbeta_params (bool): … see key bin_rvbeta_params in the configuration of quantile-based Distribution.

  • bin_rvlinear_params (bool): … see key bin_rvlinear_params in the configuration of quantile-based Distribution.

The returned dict can directly (i.e., without modification) be used as configuration dictionary to generate a quantile-based Distribution.

Return type:

dict

Examples:

Usage of this function is demonstrated in the examples of the quantile-based Distribution.

fesslix.tools.fit_tail_to_data()#
Syntax:

fesslix.tools.fit_tail_to_data(tail_data_transformed, bound=None)

Description:

Fits a probabilistic model to the data-points associated with the tails of a distribution.

This function is called internally by fesslix.tools.discretize_x_from_data() for fitting the lower and upper tail to data.

Parameters:
  • tail_data_transformed (numpy.ndarray) – The data-points associated with the tail. The values need to be transformed such that they are all positive. A value of zero is associated with the cutting-quantile of the tail. The larger the value, the further the point is in the tail.

  • bound (float) – A value specifying an absolute upper value (bound) for the tail. The value needs to be transformed as the data in tail_data_transformed.

Supported probabilistic models for the tail:

Currently, the following probabilistic distribution models for the tail are supported:

Returns:

A Python dictionary that contains the configuration for the probabilistic model of the tail.

The returned dict has the following structure:

  • models (dict): A Python dictionary that contains all fitted models. All supported models are listed above. The key (of type flx_rv_type) corresonds to the respective model. The value is a Python dictionary that contains with the following structure:

    • type (flx_rv_type): The type of the tail model.

    • pdf_0 (float): The value of the PDF at zero.

    • nll (float): Value of the negative log-likelihood of the data (w.r.t. the fitted probabilistic model).

    • kstest_D (float): KS test statistic

    • kstest_p (float): p value from KS test

    Additionally, for each probabilistic model, all parameters are specified such that the returned dict can be used as configuration dict to define a random variable (flx.rv.__init__()).

  • best_model (flx_rv_type): A reference to the model in models with the smallest value of the negative log likelihood (nll).

  • use_model (flx_rv_type): A reference to the model in models that should actually be used as probabilistic model for the tail. By default, this value is set equal to the value of best_model. This value needs to be modified by the user if a model different from the one with the largest likelihood should be used. If use_model is set to "None" (as string), no probabilistic distribution model is associated with the tail.

Return type:

dict

fesslix.tools.fit_pdf_based_on_qvec()#
Syntax:

fesslix.tools.fit_pdf_based_on_qvec(data, config)

Description:

Fits a PDF based on linear interpolation to a data vector.

Parameters:
Returns:

None

Modification of the configuration dictionary:

The configuration dictionary config is extended by this function. Specifically, the key pdf_vec is added and the key ìnterpol is changed to pdf_linear (compare quantile-based Distribution).

Examples:

Usage of this function is demonstrated in the examples of the quantile-based Distribution.