quantile-based Distribution#
In the following, let \(X\) be a random variable that follows a distribution that is defined in terms of quantiles. An outcome of \(X\) is denoted as \(x\).
Syntax#
- property quantiles#
Quantile-based distribution
quantilesis a distribution type (flx_rv_type) for Random variables in Fesslix.- Parametrization:
The distribution expects a single parameter
configof type dict. The key-value pairs listed in the following can be specified inconfig:N_bins(int): the total number of bins; value must be positive.q_vec(numpy.ndarray): A sorted (from small to large) vector of quantiles that are used to define the CDF of the distribution. The size of the array must equalN_bins+1.p_vec(numpy.ndarray): A vector of probabilities associated with the quantiles specified inq_vec. The first value must be zero and the last value one.use_tail_fit(bool, default: False): If False, the first and last bin are treated as any other bin. If True, special models for the tails are considered.tail_lower(dict): Configuration dictionary for the lower tail. The structure of the dict corrsponds to the one returned byfesslix.tools.fit_tail_to_data().tail_upper(dict): Configuration dictionary for the upper tail. The structure of the dict corrsponds to the one returned byfesslix.tools.fit_tail_to_data().interpol(Word, default:uniform): Mode of interpolation between the quantile values. The following interpolation modes are supported:uniform: The PDF inside a bin is uniform (i.e., constant).pchip: The PCHIP interpolant takes non-equispaced data and interpolates between them via cubic Hermite polynomials whose slopes are chosen so that the resulting interpolant is monotonic.bin_beta: Each bin is modeled by a separate beta distribution. The parameters of the beta distributions need to be specified as entrybin_rvbeta_paramsinconfig, wherebin_rvbeta_paramsis a numpy.ndarray of sizeN_bins*2. The parameterbin_rvbeta_paramsis provided by default byfesslix.tools.discretize_x_from_data().bin_linear: Each bin is modeled by a separately by a linear function. The gradients of the bins need to be specified as entrybin_rvlinear_paramsinconfig, wherebin_rvlinear_paramsis a numpy.ndarray of sizeN_bins. The parameterbin_rvlineara_paramsis provided by default byfesslix.tools.discretize_x_from_data().pdf_linear: Linear interpolation is used to represent the PDF of the distribution. Therefore, the values inp_vecare NOT used in this case. Instead, values of the PDF associated with the values inq_vecneed to be specified as entrypdf_vecinconfig, wherepdf_vecis a numpy.ndarray of sizeN_bins+1. In order to generate parameterpdf_vec, the functionfesslix.tools.fit_pdf_based_on_qvec()needs to be called.
Hint
To generate a quantile-based distribution based on data (e.g., random samples), the function
fesslix.tools.discretize_x_from_data()can be used; this function takes a data-array as input and returns a dict-object that can directly be used to generate a quantile-based random variable.
Example#
use the default discretization of data into bins#
Initially, we start the engine, load the required modules and define some helping functions.
import fesslix as flx
flx.load_engine()
import fesslix.tools
import fesslix.plot
import fesslix.model_templates
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def test_rv(rv,throw=True):
for p in [1e-6,1e-3,1e-2,0.05,0.33,0.5,0.66,0.95,1-1e-2,1.-1e-3,1.-1e-6]:
x = rv.icdf(p)
p_ = rv.cdf(x)
print(f"{p:10.4e}, {p_:10.4e}, {x:8.2f}")
if throw:
p_ref = min(p,1.-p)
if abs(p-p_)/p_ref>1e-3:
raise NameError(f"ERROR 202504281221")
rv_lst = [ ]
Random Number Generator: MT19937 - initialized with rand()=167642503;
Random Number Generator: MT19937 - initialized with 1000 initial calls.
Let’s generate some data to work with:
## ==============================================
## Generate model
## ==============================================
my_model = fesslix.model_templates .generate_reliability_R_S_example()
## ==============================================
## Set up dataBox
## ==============================================
dBox_1 = flx.dataBox(my_model['sampler'].get_NOX(),len(my_model['model']))
## ----------------------------------------
## set up storing data in memory
## ----------------------------------------
dBox_1.write2mem( {
'N_reserve': int(1e6),
'cols': 'all'
} )
## ==============================================
## Perform the Monte Carlo simulation
## ==============================================
my_model['sampler'].perform_MCS(40000,my_model['model'],dBox_1)
## ==============================================
## Extract a data-column from dBox_1
## ==============================================
data_fvec = dBox_1.extract_col_from_mem( { 'set':'out', 'id':0} )
Next, we process to generated sample data into bins.
rv_q_config = fesslix.tools.discretize_x_from_data(data_fvec)
print(rv_q_config['N_bins'])
34
The obtained Python dict can be directly used to define a random variable that follows a quantile-based distribution.
rv_q_config['name'] = "default" ## specify a name (optional)
rv_0 = flx.rv(rv_q_config)
test_rv(rv_0)
rv_lst.append(rv_0)
1.0000e-06, 1.0000e-06, -6.63
1.0000e-03, 1.0000e-03, -2.80
1.0000e-02, 1.0000e-02, -1.14
5.0000e-02, 5.0000e-02, 0.33
3.3000e-01, 3.3000e-01, 3.01
5.0000e-01, 5.0000e-01, 4.00
6.6000e-01, 6.6000e-01, 4.93
9.5000e-01, 9.5000e-01, 7.74
9.9000e-01, 9.9000e-01, 9.30
9.9900e-01, 9.9900e-01, 11.04
1.0000e+00, 1.0000e+00, 14.73
Let’s plot the PDF of the distribution:
def gen_PDF():
fig, ax = plt.subplots(figsize=(10, 4))
for rv in rv_lst:
fesslix.plot.draw_pdf(ax, rv, config_dict={'label':rv.get_name()})
ax.set_ylim([0., None])
plt.xlabel(r"$x$")
plt.ylabel(r"$f_X(x)$")
plt.legend()
plt.show()
gen_PDF()
… and the CDF:
def gen_CDF():
fig, ax = plt.subplots(figsize=(10, 4))
for rv in rv_lst:
fesslix.plot.draw_cdf(ax, rv, config_dict={'label':rv.get_name()})
ax.set_ylim([0., 1.])
plt.xlabel(r"$x$")
plt.ylabel(r"$F_X(x)$")
plt.legend()
plt.show()
gen_CDF()
To change the type of interpolation to a linear interpolation of the PDF.
For this, we need to call fesslix.tools.fit_pdf_based_on_qvec().
## extend rv_q_config
fesslix.tools.fit_pdf_based_on_qvec(data_fvec, rv_q_config)
## define the random variable
rv_q_config['name'] = "pdf_linear" ## specify a name (optional)
rv_1 = flx.rv(rv_q_config)
## test and plot the random variable
test_rv(rv_1)
rv_lst.append(rv_1)
gen_PDF()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[7], line 2
1 ## extend rv_q_config
----> 2 fesslix.tools.fit_pdf_based_on_qvec(data_fvec, rv_q_config)
4 ## define the random variable
5 rv_q_config['name'] = "pdf_linear" ## specify a name (optional)
File ~/checkouts/readthedocs.org/user_builds/fesslix-docu/envs/latest/lib/python3.14/site-packages/fesslix/tools.py:747, in fit_pdf_based_on_qvec(data, config)
743 bounds = [(0., None) for _ in range(len(p0))] # for all p in pdf_vec >= 0
744 ## ==============================================
745 ## perform the optimization
746 ## ==============================================
--> 747 result = scipy.optimize.minimize(
748 neg_log_likelihood,
749 p0,
750 method='SLSQP',
751 bounds=bounds,
752 constraints=constraints,
753 options={'disp': True}
754 )
755 if not result.success:
756 raise RuntimeError("202504280945: optimization failed » " + result.message)
File ~/checkouts/readthedocs.org/user_builds/fesslix-docu/envs/latest/lib/python3.14/site-packages/scipy/optimize/_minimize.py:796, in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
793 res = _minimize_cobyqa(fun, x0, args, bounds, constraints, callback,
794 **options)
795 elif meth == 'slsqp':
--> 796 res = _minimize_slsqp(fun, x0, args, jac, bounds,
797 constraints, callback=callback, **options)
798 elif meth == 'trust-constr':
799 res = _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,
800 bounds, constraints,
801 callback=callback, **options)
File ~/checkouts/readthedocs.org/user_builds/fesslix-docu/envs/latest/lib/python3.14/site-packages/scipy/optimize/_slsqp_py.py:387, in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, finite_diff_rel_step, workers, **unknown_options)
373 exit_modes = {-1: "Gradient evaluation required (g & a)",
374 0: "Optimization terminated successfully",
375 1: "Function evaluation required (f & c)",
(...) 382 8: "Positive directional derivative for linesearch",
383 9: "Iteration limit reached"}
385 # Set the parameters that SLSQP will need
386 # meq, mieq: number of equality and inequality constraints
--> 387 meq = sum(map(len, [np.atleast_1d(c['fun'](x, *c['args']))
388 for c in cons['eq']]))
389 mieq = sum(map(len, [np.atleast_1d(c['fun'](x, *c['args']))
390 for c in cons['ineq']]))
391 # m = The total number of constraints
File ~/checkouts/readthedocs.org/user_builds/fesslix-docu/envs/latest/lib/python3.14/site-packages/fesslix/tools.py:738, in fit_pdf_based_on_qvec.<locals>.integral_constraint(p)
736 def integral_constraint(p):
737 pdf_vec[i_start:i_end] = p
--> 738 return np.trapz(pdf_vec, x_vec) - Pr_in
File ~/checkouts/readthedocs.org/user_builds/fesslix-docu/envs/latest/lib/python3.14/site-packages/numpy/__init__.py:792, in __getattr__(attr)
789 import numpy.char as char
790 return char.chararray
--> 792 raise AttributeError(f"module {__name__!r} has no attribute {attr!r}")
AttributeError: module 'numpy' has no attribute 'trapz'
Alternatively, PCHIP can be used for interpolation:
rv_q_config['interpol'] = 'pchip'
rv_q_config['name'] = "pchip" ## specify a name (optional)
rv_2 = flx.rv(rv_q_config)
## Note that the x2y- and y2x-transforms are not truly inverse functions of each other
## ... keep this in mind!
test_rv(rv_2,throw=False)
rv_lst.append(rv_2)
gen_PDF()
1.0000e-06, 1.0000e-06, -9.50
1.0000e-03, 1.0000e-03, -2.62
1.0000e-02, 9.9946e-03, -1.13
5.0000e-02, 4.9990e-02, 0.36
3.3000e-01, 3.3001e-01, 3.01
5.0000e-01, 5.0000e-01, 4.00
6.6000e-01, 6.6001e-01, 4.91
9.5000e-01, 9.4999e-01, 7.75
9.9000e-01, 9.9000e-01, 9.38
9.9900e-01, 9.9900e-01, 11.26
1.0000e+00, 1.0000e+00, 14.89
modify the tail-model#
To output the different tail-models available (for the upper tail):
for tmn, tmodel in rv_q_config['tail_upper']['models'].items():
print(f"{tmn}: {tmodel['nll']}")
genpareto: 70.00820273643549
logn: 87.88850222858704
We can also use the log-Normal model for the upper and lower tail:
## define the random variable
rv_q_config['tail_upper']['use_model'] = 'logn'
rv_q_config['tail_lower']['use_model'] = 'logn'
rv_q_config['interpol'] = 'pdf_linear'
rv_q_config['name'] = "pdf_linear_logn_tail" ## specify a name (optional)
rv = flx.rv(rv_q_config)
## test and plot the random variable
test_rv(rv)
rv_lst.append(rv)
gen_PDF()
1.0000e-06, 1.0000e-06, -121.51
1.0000e-03, 1.0000e-03, -4.03
1.0000e-02, 1.0000e-02, -1.12
5.0000e-02, 5.0000e-02, 0.36
3.3000e-01, 3.3000e-01, 3.00
5.0000e-01, 5.0000e-01, 4.00
6.6000e-01, 6.6000e-01, 4.91
9.5000e-01, 9.5000e-01, 7.75
9.9000e-01, 9.9000e-01, 9.37
9.9900e-01, 9.9900e-01, 12.72
1.0000e+00, 1.0000e+00, 136.22
We can also specify aboslute lower and upper bounds. In this case the beta distribution is also available as a model for the tails.
## re-run the discretization into bins - with fixed bounds.
rv_q_config = fesslix.tools.discretize_x_from_data(data_fvec,
lower_bound=-10., upper_bound=20.)
## output the tail models (for the upper tail)
for tmn, tmodel in rv_q_config['tail_upper']['models'].items():
print(f"{tmn}: {tmodel['nll']}")
## define the random variable
rv_q_config['tail_upper']['use_model'] = 'beta'
rv_q_config['tail_lower']['use_model'] = 'beta'
rv_q_config['name'] = "beta_tails" ## specify a name (optional)
rv = flx.rv(rv_q_config)
## test and plot the random variable
test_rv(rv)
rv_lst.append(rv)
gen_PDF()
genpareto: 70.00820273643549
logn: 87.88850222858704
beta: 70.00902305966775
1.0000e-06, 1.0000e-06, -6.56
1.0000e-03, 1.0000e-03, -2.68
1.0000e-02, 1.0000e-02, -1.14
5.0000e-02, 5.0000e-02, 0.35
3.3000e-01, 3.3000e-01, 3.01
5.0000e-01, 5.0000e-01, 4.00
6.6000e-01, 6.6000e-01, 4.91
9.5000e-01, 9.5000e-01, 7.75
9.9000e-01, 9.9000e-01, 9.39
9.9900e-01, 9.9900e-01, 11.26
1.0000e+00, 1.0000e+00, 15.07
modify the discretization into bins#
You can also use a uniform distribution of samples into bins:
## perform discretization into bins and define the random variable
rv_q_config = fesslix.tools.discretize_x_from_data(data_fvec, config={
'mode':'equidist_p',
'N_bins': 30
})
rv_q_config['name'] = "equidist_p" ## specify a name (optional)
rv = flx.rv(rv_q_config)
## test and plot the random variable
test_rv(rv)
rv_lst.append(rv)
gen_PDF()
1.0000e-06, 1.0000e-06, -6.10
1.0000e-03, 1.0000e-03, -2.83
1.0000e-02, 1.0000e-02, -1.12
5.0000e-02, 5.0000e-02, 0.32
3.3000e-01, 3.3000e-01, 3.01
5.0000e-01, 5.0000e-01, 4.00
6.6000e-01, 6.6000e-01, 4.91
9.5000e-01, 9.5000e-01, 7.79
9.9000e-01, 9.9000e-01, 9.37
9.9900e-01, 9.9900e-01, 11.29
1.0000e+00, 1.0000e+00, 14.85