tools - a collection of useful features

`tools` - a collection of useful features#

Todo

Write this section.

import fesslix as flx
flx.load_engine()
import fesslix.tools
import fesslix.model_templates as flx_model_templates
import fesslix.plot as flx_plot

import matplotlib.pyplot as plt
import numpy as np

Random Number Generator: MT19937 - initialized with rand()=407372254;
Random Number Generator: MT19937 - initialized with 1000 initial calls.

Working with files#

fesslix.tools.replace_in_template()#

Syntax:

fesslix.tools.replace_in_template(fn_in, fn_out, dmap, var_indi_start="@{", var_indi_end="}")

Description:

Replaces expressions of the type @{VARNAME} in file fn_in with the values in dmap and writes the processed file to fn_out.

If the expression in VARNAME starts with a !, the characters after the ! are interpreted as expression of type rvFullID and the value of the associated random variable is inserted.

Parameters:

fn_in (str) – File name of the template input file.
fn_out (str) – File name of the output file to generate.
dmap (dict) – A dictionary of all variables that can potentially appear in the file fn_in.
var_indi_start (str) – Unique expression (w.r.t. structure of file) for the beginning of the expression to replace.
var_indi_end (str) – Unique expression (w.r.t. structure of file) for the ending of the expression to replace.

Return type:

None

Example:

## ================================================================
## Generate a dictionary
## ================================================================
dmap = {}
dmap['var1'] = 42.42
dmap['var2'] = "Hello world!"

## ================================================================
## Generate a set of random variables
## ================================================================
config_rv_a1 = { 'name':'rv1', 'type':'stdn' }
config_rv_a2 = { 'name':'rv2', 'type':'logn', 'mu':1., 'sd':2. }
rv_set_a = flx.rv_set( {'name':'rv_set_a'}, [ config_rv_a1, config_rv_a2 ] )
sampler_a = flx.sampler(['rv_set_a'])
sampler_a.sample()
    
## ================================================================
## Open template file and start replacing
## ================================================================
fesslix.tools.replace_in_template(fn_in="../data/sample_text.txt", fn_out="modified_text.txt", dmap=dmap)

The content of the template file is:

!cat ../data/sample_text.txt

@{var2} » This is an example text file @{var1}. 

The content is to be modified by »fesslix.tools.replace_in_template«.

Value of random variable: @{!rv_set_a::rv1}

The content of the generated output file is:

!cat modified_text.txt

Hello world! » This is an example text file 42.42. 

The content is to be modified by »fesslix.tools.replace_in_template«.

Value of random variable: -0.110861

Discretization#

fesslix.tools.discretize_x()#

Syntax:: fesslix.tools.discretize_x(x_low, x_up, x_disc_N=int(1e3), x_disc_shift=False, x_disc_on_log=False)
Description:: Discretizes a domain between x_low and x_up either linearly or in log scale.

Parameters:

x_low (float) – Lower bound of the discretization interval.
x_up (float) – Upper bound of the discretization interval.
x_disc_N (int) – Number of discretization points to generate.
x_disc_shift (bool) – If True, the start and end point is shifted by half the mesh size so that x_low and x_up are not part of the returned set of discretization points.
x_disc_on_log (bool) – If True, the discretization is performed in log-space (and both x_low and x_up must be larger than zero).

Returns:

The x_disc_N points of discretization.

Return type:

numpy.ndarray

Example:

print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=True, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0.01, x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=True) )

[0.   0.25 0.5  0.75 1.  ]
[0.1 0.3 0.5 0.7 0.9]
[0.01       0.03162278 0.1        0.31622777 1.        ]

fesslix.tools.discretize_x_get_diff()#

Syntax:

fesslix.tools.discretize_x_get_diff(x_low, x_up, x_disc_N=int(1e3), x_disc_on_log=False)

Description:

Discretizes a domain between x_low and x_up. In addition to the discretization points, information about the grid size is returned.

Internally, this function calls fesslix.tools.discretize_x() with x_disc_shift=True. The returned discretization points x are then interpreted as the mid-points of a discretization grid and the size of each grid element is returned as a second vector dx.

Parameters:

x_low (float) – Lower bound of the discretization interval.
x_up (float) – Upper bound of the discretization interval.
x_disc_N (int) – Number of discretization points to generate.
x_disc_on_log (bool) – If True, the discretization is performed in log-space (and both x_low and x_up must be larger than zero).

Returns:

(x, dx) » x: mid points of grid elements; dx size of grid elements

Return type:

(numpy.ndarray, numpy.ndarray)

Example:

x, dx = fesslix.tools.discretize_x_get_diff(x_low=0., x_up=1., x_disc_N=5, x_disc_on_log=False) 
print(x)
print(dx)

[0.1 0.3 0.5 0.7 0.9]
[0.2 0.2 0.2 0.2 0.2]

fesslix.tools.discretize_stdNormal_space()#

Syntax:: fesslix.tools.discretize_stdNormal_space(q_low=1e-3, q_up=None, x_disc_N=int(1e3))
Description:: Returns equally spaced discretization points in the standard Normal domain. The parameters q_low and q_up are used to define the bounds in standard Normal space.

Parameters:

q_low (float) – Quantile of a lower bound..
q_up (float) – Quantile of an upper bound. If None is specified, q_up=1.-q_low is assigned.
x_disc_N (int) – Number of discretization points to generate.

Returns:

The x_disc_N points of discretization.

Return type:

numpy.ndarray

Example:

print( fesslix.tools.discretize_stdNormal_space( x_disc_N=5 ) )

[-3.09023231 -1.54511615  0.          1.54511615  3.09023231]

fesslix.tools.detect_bounds_x()#

Syntax:: flx_tools.detect_bounds_x(rv, config_dict, q_low=1e-3, q_up=None, mode='ignore')
Description:: Evaluates and sets bounds x_low and x_up of random variable rv based on quantile values q_low and q_up. If existing values are to be overwritten is controled by mode.

Parameters:

rv (flx.rv) – random variable
config_dict (flx_plot_config) – configuration dictionary » This function ensures that the parameters x_low and x_up are assigned.
q_low (flx_pr) – Quantile value for lower bound.
q_up (flx_pr) – Quantile value for upper bound.
mode (str) –
Controls how existing values of x_low and x_up in config_dict are handled. The following keywords are allowed:
- 'ignore': ignore bounds of rv, if x_low and x_up are already set in config_dict
- 'overwrite': use bounds of rv, even if x_low and x_up are already set in config_dict
- 'minmax': use smallest value for bounds, if x_low and x_up are already set in config_dict

Returns:

None

Example:

rv = flx.rv({'type':'stdn'})
era_dict = { }
fesslix.tools.detect_bounds_x(rv, era_dict,q_low=1e-6,q_up=0.99)
print(era_dict)

{'x_low': -4.753424308822899, 'x_up': 2.3263478740408408}

Data fitting#

fesslix.tools.discretize_x_from_data()#

Syntax:: fesslix.tools.discretize_x_from_data(data,config={}, data_is_sorted=False, lower_bound=None, upper_bound=None)
Description:: Discretize the parameter space into bins based on a data array.

Parameters:

data (numpy.ndarray) – vector of data/samples
config (dict) – configuration dictionary
data_is_sorted (bool) – Set this to True if the values in data are sorted from smallest to largest.
lower_bound (float | None) – Value of an absolute lower bound. Set lower_bound to None if a lower bound does not exist.
upper_bound (float | None) – Value of an absolute upper bound. Set upper_bound to None if an upper bound does not exist.

Configuration directory:

The following keys are allowed in the configuration dictionary config:

mode (Word, default:adaptive): mode for the discretization of data into bins. The following modes for discretization are supported:
- adaptive: The bin size is selected adaptively based on a minimum number of data-points per bin and on a minimum bin size.
  For this mode, the following keys are additionally accepted in the configuration dictionary:
  
  N_points_per_bin_min (int, default: 100): Minimum number of data-points per bin. Any bin must contain at least N_points_per_bin_min data-points. The specified integer value must be positive.
  
  dx_min (float): Minimum size of a bin in parameter space. Any bin must have at least a width of dx_min. If dx_min is not specified, the default value is assigned such that at most 8 bins fit into the interval spanned by the 75% and 25% quantile.
- equidist_p: An equidistant grid in probability space is used to generate the bins.
  For this mode, the following keys are additionally accepted in the configuration dictionary:
  
  N_bins (int, optional): Total number of bins.
  
  N_points_per_bin (int, default: 100): Number of data-points per bin. This parameter is only considered if N_bins is not specified.
- fixed_p: The user provides the grid layout.
  For this mode, the following key must be specified in the configuration dictionary:
  
  p_vec (numpy.ndarray): A numpy array with the probabilities of the discretization points of the grid (i.e., the edges of the bins). The first entry in p_vec must be zero and the last entry must equal one.
tail_upper (dict, default:None): Sets information about the location of the start of the upper tail. The following key-value pairs are accepted:
- p (float): Probability value that the distribution is smaller or equal than the starting value of the tail.
- x (float): Starting value of the tail.
- data (numpy.ndarray, optional): A vector of samples in the tail. If specified, these samples are used to fit the tail instead of the samples in the global data array.
tail_lower (dict, default:None): Sets information about the location of the start of the upper tail. Configuration is identical to tail_upper.

Returns:

A Python dictionary that contains the configuration of data into bins.

The returned dict has the following structure:

N_total (int): The total number of data-points in the array specified by the input parameter data.

N_bins (int): The total number of bins generated.

q_vec (numpy.ndarray): Vector of quantiles of the edges of the bins (of size N_bins+1).

p_vec (numpy.ndarray): Vector of probabilities associated with the values in q_vec (of size N_bins+1).

N_vec (numpy.ndarray): Number of data-points that fall into the individual bins (of size N_bins+1).

tail_upper (dict): A configuration dictionary for modelling the upper tail. The structure of the dict corrsponds to the one returned by fesslix.tools.fit_tail_to_data().

tail_lower (dict): A configuration dictionary for modelling the lower tail. The structure of the dict corrsponds to the one returned by fesslix.tools.fit_tail_to_data().

type (Word): Set to quantiles, so that the returned configuration dictionary can directly be used to generate a quantile-based Distribution.

interpol (Word): Mode of interpolation. For documentation, please see key ìnterpol in the configuration of quantile-based Distribution.

use_tail_fit (bool): … see key use_tail_fit in the configuration of quantile-based Distribution.

bin_rvbeta_params (bool): … see key bin_rvbeta_params in the configuration of quantile-based Distribution.

bin_rvlinear_params (bool): … see key bin_rvlinear_params in the configuration of quantile-based Distribution.

The returned dict can directly (i.e., without modification) be used as configuration dictionary to generate a quantile-based Distribution.

Return type:

dict

Examples:: Usage of this function is demonstrated in the examples of the quantile-based Distribution.

fesslix.tools.fit_tail_to_data()#

Syntax:

fesslix.tools.fit_tail_to_data(tail_data_transformed, bound=None)

Description:

Fits a probabilistic model to the data-points associated with the tails of a distribution.

This function is called internally by fesslix.tools.discretize_x_from_data() for fitting the lower and upper tail to data.

Parameters:

tail_data_transformed (numpy.ndarray) – The data-points associated with the tail. The values need to be transformed such that they are all positive. A value of zero is associated with the cutting-quantile of the tail. The larger the value, the further the point is in the tail.
bound (float) – A value specifying an absolute upper value (bound) for the tail. The value needs to be transformed as the data in tail_data_transformed.

Supported probabilistic models for the tail:

Currently, the following probabilistic distribution models for the tail are supported:

genpareto » generalized Pareto distribution
logn » Log-normal distribution
beta » Beta distribution (only if bound is not None)

Returns:

A Python dictionary that contains the configuration for the probabilistic model of the tail.

The returned dict has the following structure:

models (dict): A Python dictionary that contains all fitted models. All supported models are listed above. The key (of type flx_rv_type) corresonds to the respective model. The value is a Python dictionary that contains with the following structure:

type (flx_rv_type): The type of the tail model.

pdf_0 (float): The value of the PDF at zero.

nll (float): Value of the negative log-likelihood of the data (w.r.t. the fitted probabilistic model).

kstest_D (float): KS test statistic

kstest_p (float): p value from KS test

Additionally, for each probabilistic model, all parameters are specified such that the returned dict can be used as configuration dict to define a random variable (flx.rv.__init__()).

best_model (flx_rv_type): A reference to the model in models with the smallest value of the negative log likelihood (nll).

use_model (flx_rv_type): A reference to the model in models that should actually be used as probabilistic model for the tail. By default, this value is set equal to the value of best_model. This value needs to be modified by the user if a model different from the one with the largest likelihood should be used. If use_model is set to "None" (as string), no probabilistic distribution model is associated with the tail.

Return type:

dict

fesslix.tools.fit_pdf_based_on_qvec()#

Syntax:: fesslix.tools.fit_pdf_based_on_qvec(data, config)
Description:: Fits a PDF based on linear interpolation to a data vector.

Parameters:

data (numpy.ndarray) – vector of data/samples
config (dict) – Configuration dictionary, as returned by fesslix.tools.discretize_x_from_data().

Returns:

None

Modification of the configuration dictionary:: The configuration dictionary config is extended by this function. Specifically, the key pdf_vec is added and the key ìnterpol is changed to pdf_linear (compare quantile-based Distribution).
Examples:: Usage of this function is demonstrated in the examples of the quantile-based Distribution.

tools - a collection of useful features

Contents

tools - a collection of useful features#

Working with files#

Discretization#

Data fitting#

`tools` - a collection of useful features#