gpr - Gaussian Process Regression

`gpr` - Gaussian Process Regression#

Theory#

basic definitions#

Our goal is to learn function \(f(\mathbf{x}_\star)\), where \(\mathbf{x}_\star\in\mathbb{R}^m\) is a \(m\) dimensional input vector. For learning, we have a total \(n\) noisy input/output observations of \(y(\mathbf{x}_i) = f(\mathbf{x}_i)+\varepsilon_i\), with \(i=1,\ldots,n\) and \(\varepsilon_i\) is independent identically distributed noise with variance \(\sigma_\varepsilon^2\). Let \(\mathbf{X}\) be the \(n\times m\) dimensional matrix of the observed input data-points; i.e., \(\mathbf{X}_i=\mathbf{x}_{(i)}^\operatorname{T}\) with \(\mathbf{x}_{(i)}\) as the \(i\)th observed input data-point of dimension \(m\). \(\mathbf{y}\) is the \(n\) dimensional vector of the observed noisy model output associated with \(\mathbf{X}\).

Using Gaussian process regression, the prior of function \(f(\mathbf{x}_\star)\) is expressed using the following surrogate model:

\[f(\mathbf{x}_\star) = \mathbf{b}(\mathbf{x}_\star)^\operatorname{T}\mathbf{a}+\mathbf{Z}(\mathbf{x}_\star)\;,\]

where \(\mathbf{b}(\mathbf{x}_\star)^\operatorname{T}\mathbf{a}\) is a trend function that depends on a \(k\) dimensional parameter vector \(\mathbf{a}\) and vector of shape functions \(\mathbf{b}(\cdot)\). \(\mathbf{Z}(\mathbf{x}_\star)\) is a zero mean Gaussian process with variance \(\sigma_Z^2\) and autocorrelation coefficient function \(r(\mathbf{x},\mathbf{x}^\prime)\).

joint distribution model#

The joint distribution of the observed target values and the function value at \(\mathbf{x}_\star\) is:

\[\begin{split} \left[ \begin{matrix} \mathbf{y}\\ f_\star \end{matrix} \right] = \mathcal{N}\left( \left[ \begin{matrix} \mathbf{B}\mathbf{a}\\ \mathbf{b}_\star^\operatorname{T}\mathbf{a} \end{matrix} \right] , \left[ \begin{matrix} \sigma_Z^2\mathbf{R}+\sigma_\varepsilon^2\mathbf{I} & \mathbf{k}_\star \\ \mathbf{k}_\star^\operatorname{T} & k_{\star\star} \end{matrix} \right] \right)\;, \end{split}\]

where

\(\mathbf{y}\) is the \(n\) dimensional vector of the observed noisy model output.
\(f_\star\) is the function value at \(\mathbf{x}_\star\) that is to be predicted,
\(\mathbf{b}_\star = \mathbf{b}(\mathbf{x}_\star)\) is the vector of shape functions evaluated at \(\mathbf{x}_\star\),
\(\mathbf{B}_{ij} = \mathbf{b}_j(\mathbf{x}_{(i)})\); i.e, \(\mathbf{B}\) is a \(n\times k\) dimensional matrix with coefficients as the \(j\)th shape function evaluated at \(\mathbf{x}_{(i)}\),
\(\mathbf{R}_{ij} = r\left(\mathbf{x}_{(i)},\mathbf{x}_{(j)}\right)\),
\(\left(\mathbf{k}_{\star}\right)_j = \sigma_Z^2 \cdot r\left(\mathbf{x}_{(j)},\mathbf{x}_{\star}\right)\), and
\(k_{\star\star} = \sigma_Z^2 \cdot r\left(\mathbf{x}_{\star},\mathbf{x}_{\star}\right) = \sigma_Z^2\).

predictive distribution#

\[ \operatorname{E}\left[f_\star\right] = \mathbf{b}_\star^\operatorname{T}\mathbf{a} + \mathbf{k}_\star^\operatorname{T}\left(\sigma_Z^2\mathbf{R}+\sigma_\varepsilon^2\mathbf{I}\right)^{-1}\left(y-\mathbf{B}\mathbf{a}\right) \]

\[ \operatorname{Var}\left[f_\star\right] = k_{\star\star} - \mathbf{k}_\star^\operatorname{T}\left(\sigma_Z^2\mathbf{R}+\sigma_\varepsilon^2\mathbf{I}\right)^{-1} \mathbf{k}_\star \]

log marginal likelihood#

\[\begin{split} \begin{align*} \ln(\mathcal{L}) = &-\frac{1}{2} \left(y-\mathbf{B}\mathbf{a}\right)^\operatorname{T}\left(\sigma_Z^2\mathbf{R}+\sigma_\varepsilon^2\mathbf{I}\right)^{-1}\left(y-\mathbf{B}\mathbf{a}\right) \\ &- \frac{1}{2} \log\left|\sigma_Z^2\mathbf{R}+\sigma_\varepsilon^2\mathbf{I}\right| \\ &- \frac{n}{2} \log\left(2\pi\right) \end{align*} \end{split}\]

least squares estimation#

introduction#

Note that the covariance of the noisy model output can be re-written as:

\[ \sigma_Z^2\mathbf{R}+\sigma_\varepsilon^2\mathbf{I} = \sigma_Y^2 \left( \sigma_Z^2/\sigma_Y^2\mathbf{R} + \sigma_\varepsilon^2/\sigma_Y^2\mathbf{I} \right) = \sigma_Y^2 \mathbf{Q} \;, \]

with \(\sigma_Y^2 = \sigma_Z^2+\sigma_\varepsilon^2\).

Thus, the predictive distribution can also be stated as:

\[ \operatorname{E}\left[f_\star\right] = \mathbf{b}_\star^\operatorname{T}\mathbf{a} + \mathbf{k}_\star^\operatorname{T}\left(\sigma_Y^2\mathbf{Q}\right)^{-1}\left(y-\mathbf{B}\mathbf{a}\right) \]

\[ \operatorname{Var}\left[f_\star\right] = k_{\star\star} - \mathbf{k}_\star^\operatorname{T}\left(\sigma_Y^2\mathbf{Q}\right)^{-1} \mathbf{k}_\star \]

\[\begin{split} \begin{array}{lll} \ln(\mathcal{L}) &= &-\frac{1}{2} \left(y-\mathbf{B}\mathbf{a}\right)^\operatorname{T}\left(\sigma_Y^2\mathbf{Q}\right)^{-1}\left(y-\mathbf{B}\mathbf{a}\right) \\ & &- \frac{1}{2} \log\left|\sigma_Y^2\mathbf{Q}\right| \\ & &- \frac{n}{2} \log\left(2\pi\right) \\ &= &-\frac{1}{2\sigma_Y^2} \left(y-\mathbf{B}\mathbf{a}\right)^\operatorname{T}\mathbf{Q}^{-1}\left(y-\mathbf{B}\mathbf{a}\right) \\ & &- \frac{n}{2} \log\left(\sigma_Y^2\right) - \frac{1}{2} \log\left|\mathbf{Q}\right| \\ & &- \frac{n}{2} \log\left(2\pi\right) \end{array} \end{split}\]

least squares estimation of the trend function#

It can be shown that conditional on \(\mathbf{Q}\), the vector \(\hat{\mathbf{a}}\) that maximizes the likelihoood function is:

\[ \hat{\mathbf{a}} = \left(\mathbf{B}^\operatorname{T}\mathbf{Q}^{-1}\mathbf{B}\right)^{-1} \mathbf{B}^\operatorname{T}\mathbf{Q}^{-1}\mathbf{y} \]

least squares estimation of \(\sigma_Y^2\)#

It can be shown that conditional on \(\mathbf{Q}\) and \(\mathbf{a}\), the \(\hat{\sigma}_Y^2\) that maximizes the likelihoood function is:

\[ \hat{\sigma}_Y^2 = \frac{1}{n} \left(y-\mathbf{B}\mathbf{a}\right)^\operatorname{T}\mathbf{Q}^{-1}\left(y-\mathbf{B}\mathbf{a}\right) \]

Syntax#

class flx.gpr.gp#

Represents/manages a Gaussian process.

A Gaussian process is completely defined by its mean function (trend) and autocovariance function (kernel). For Gaussian Process Regression (GPR), the process is conditioned on data (of the model input and output). The functions representing the trend and the kernel can utilize so-called process parameters. The values of these parameters are optimized such that the likelihood of observing the provided data is maximized.

__init__(config)#

Defines the (unconditional) Gaussian process.

Parameters:: config (dict) – Configuration directory. The structure of config is outlined in detail in the following.

General properties:

The following properties are required/allowed in config, independent of the type of the process.

name (Word, default:”name_unspecified”): The name to assign to the process.
Ndim (int): The dimension of the Gaussian process; i.e., the number of variables for the model input. Value must be a positive integer.
type (Word, default:"singleGP"): The type of the process. Can either be set to "singleGP" or "mavgGP". Properties associated with the different types are documented in the following.
useLSE (bool, default:True):

If set to True, the process parameters associated with the function for the mean value and standard deviation are directly identified (conditional on the other process parameters) using least squares. The remaining Ndim process parameters for the correlation length can be identified through a numerical optimization (see flx.gpr.gp.optimize()). If set to False, all process parameters need to be identified through numerical optimization (see flx.gpr.gp.optimize()).
descr (str, default:””): An arbitrary description for the process.

Processes of type "singleGP":

… represent a “standard” Gaussian process. The following properties are required/allowed in config:

mean_type (Word, default:”zero”): The type of the function used to represent the trend. The following options are allowed:
- "zero": The trend is set to zero. No process parameters are registered.
- "const": The trend is set to a constant value. No process parameters are registered. The constant value is specified by means of configuration option mean_value that expects a value of type float.
- "universal": The trend is represented as a parametrized function based on multivariate polynomials. The order of the polynomial is set using the configuration option mean_polyo that expects 0, 1, 2 or 3 as integer values. The parameters of the so-obtained polynomial are registered as process parameters. By default, the first process parameter (the intercept) is one and all other process parameters are set to zero.
- "ordinary": A user-defined function is assigned as trend function. For this type of trend function, a single process parameter is registered. It is one by default and scales the specified user-defined function of the trend. The user-defined function is specified by means of configuration option mean_fun that expects an expression of type flxParaFun (which accepts a Ndim-dimensional input-array).

For the covariance kernel, standard kernel types can be specified separately for each dimension:

kernel_lst (list of str): A list that specifies the kernel type associated with each dimension. The size of the list must match Ndim. The following types are supported:
- gauss: Guassian kernel
  
  \(\exp\left[-\left(\frac{x_1-x_2}{\lambda}\right)^2\right]\)
- exp: exponential kernel
  
  \(\exp\left[-\frac{\left|x_1-x_2\right|}{\lambda}\right]\)
In the equations above, \(\lambda\) specifies the correlation length.

For this type of kernel function, Ndim+1 process parameters are registered. The first controls the standard deviation of the Gaussian process, the other Ndim process parameters control the correlation length associated with each of the Ndim-kernels.

Alternatively, a user-defined function can be specified as kernel:

kernel_fun (flxParaFun): The user-defined covariance kernel function, which accepts a 2xNdim-dimensional input array.

For example, if Ndim=3, the interpretation of the input array is [X11, X12, X13, X21, X22, X23].

For this type of kernel function, no process parameters are registered.

Processes of type "mavgGP":

… represent an umbrella-class that contains a set of Gaussian processes of type "singleGP". For the final prediction, Bayesian model averaging is used to average over multiple Gaussian process models.

itermax (int, default: 500): The maximum number of iterations in the MLE optimization of each process model contained in the set (compare TODO[optimize]).

The optimization of the individual underlying process models is performed implicitly whenever the parent object is conditioned on new data. In this regard, the parent object differs from the object of a single process model (of type "singleGP") that is only optimized when explicitly requested by the user.

get_name()#

Retrieves the name of the Gaussian process.

Returns:: name of the Gaussian process
Return type:: str

get_descr()#

Retrieves the description of the Gaussian process.

Returns:: description of the Gaussian process
Return type:: str

noise_white(noise_sd)#

Specifies the standard deviation of the model error of the Gaussian process model. Set this to a value larger than zero if the Gaussian process model cannot interpolate the observed values of the model output exactly (due to uncertainty/noise).

Parameters:: noise_sd (float) – Standard deviation of white noise. Value must be positive or zero.
Return type:: None

condition_on(data_input, data_output, init_pvec=True, opt_noise=True)#

Condition a Gaussian process on input/output-data.

Parameters:

data_input (numpy.ndarray of shape (Ndim, N)) – Specifies N data-points of the model input.
data_output (numpy.ndarray of shape (N,)) – Specifies the model output associated with the N data-points of the model input.
init_pvec (bool) –
If set to True, the process parameters are initialized using a heuristic based on the statistics of the provided data.
- If the mean function is of type "ordinary" the specified function is scaled with the sample mean of the specified model output data (only if useLSE=False).
- If the mean function is of type "universal", the entire (polynomial-based) mean function is scaled with the sample mean of the specified model output data (only if useLSE=False). The process parameter that controls the intercept is set to one and all other mean-related process parameters are set to zero.
- The process parameter that controls the standard deviation is scaled using the sample standard deviation of the specified model output data.
- The process parameters that control the correlation length of the Ndim kernels are scaled using the sample standard deviation of the corresponding model input.
- Also an initial value for the standard deviation of noise is set (although it is not considered a standard process parameter). The value is selected as f*(sample standard deviation of the specified model output data), where f is 0.1 if opt_noise==True and 1e-4 otherwise.
opt_noise (bool) –
If set to True, the optimal standard deviation of the (white) noise associated with the process is identified through numerical optimization (the existing/old value is overwritten).

A log of the optimization of the standard deviation of the noise is stored internally. The logged messages of the optimization can be accessed using flx.gpr.gp.info().

Returns:

log-likelihood associated with the current values of the process parameters.

Return type:

float

The data in data_input and data_output is not copied and stored separately by the gp-model, but only a pointer to the data is stored as part of the gp-model. Therefore, one should be careful not to modify the content of the arrays associated with data_input and data_output after the gp-model was conditioned on the data. If the data associated with data_input and data_output is modified, the method flx.gpr.gp.unassemble() must be called.

optimize(itermax=500, opt_noise=False)#

Optimizes the process parameters of the Gaussian process model.

Parameters:

itermax (int) – The maximum number of iterations during MLE of the process parameters of the Gaussian process.
opt_noise (bool) –
- True: noise parameter is optimized.
- False: The standard deviation of the white noise associated with the Gaussian process model is not interpreted as a process parameter and, therefore, not optimized. To find the optimal noise for the current values of the process parameters, you can re-run flx.gpr.gp.condition_on() with init_pvec=True and opt_noise=True afterwards.

A log of the optimization is stored internally. The logged messages of the optimization can be accessed using flx.gpr.gp.info().

Important

Only process models of type "singleGP" are optimized by this function call. Process models of type "mavgGP" are optimized automatically whenever the umbrella process is condition on new data (by means of flx.gpr.gp.condition_on()).

Parameters:: opt_noise (bool) – If True, the standard deviation of the noise is treated as a process parameter and optimized.
Returns:: log-likelihood associated with the current values of the process parameters.
Return type:: float

predict(x_vec, type, predict_noise=False)#

Evaluate/predict quantities of the gp-model conditioned on the observations.

The behavior of the method depends on the value of type:

"mean": The mean value of the prediction at x_vec is returned. (float)

"sd": The standard deviation of the prediction at x_vec is returned. (float)

"mean_sd": A tuple of the mean value and the standard deviation of the prediction at x_vec is returned. (float, float)

"var": The variance of the prediction at x_vec is returned. (float)

"trend": The (prior) value of the trend function at x_vec is returned. (float)

"kernel": The (prior) value of the kernel evaluated at x_vec is returned. Note that the dimension of x_vec must equal 2*N_dim. The expected structure of x_vec equals the one documented in flx.gpr.gp.__init__(), for parameter kernel_fun. (float)

Return type:: Depends on predict_noise

info()#

Returns a dict containing information about the Gaussian process.

The returned dict has the following entries:

type (str): type of the Gaussian process

name (str): name of the Gaussian process

descr (str): description of the Gaussian process

mean (dict): information about the trend function of the Gaussian process

type (str): type of the trend function

Type "const" exports:

val (float): Constant value that represents the trend function.

Type "ordinary" exports:

para_vec (numpy.ndarray): Array of process parameters associated with the trend function.

Type "universal" exports:

para_vec (numpy.ndarray): Array of process parameters associated with the trend function.

normalizef (float): Scaling constant for the trend function.

kernel (dict): information about the (prior) kernel of the Gaussian process

type (str or list): type of the kernel

If a list of standard kernels is used, the following properties are additionally exported:

para_vec (numpy.ndarray): Array of process parameters associated with the kernel.

n_vec (numpy.ndarray): Array of scaling values linked to the process parameters.

kernel_sd (float): Standard deviation of the kernel.

noise (float): standard deviation of white noise

noise_log (str): Logging messages from the optimization of the standard deviation of noise.

opt_log (str): Logging messages from the optimization of the process parameters.

logl_obsv (float): Log-likelihood of the current observation conditional on the process parameters. This entry is only generated if the process is conditioned on a data set.

N_obsv (int): Total number of registered observations.

obsv_up2date (bool): True, if the data-set is considered to be “up to date”.

Return type:: dict

unassemble()#

Forces the gp-model to re-assemble the covariance matrix on the next call.

Return type:: None

gpr - Gaussian Process Regression

Contents

`gpr` - Gaussian Process Regression#

Theory#

basic definitions#

joint distribution model#

predictive distribution#

log marginal likelihood#

least squares estimation#

introduction#

least squares estimation of the trend function#

least squares estimation of \(\sigma_Y^2\)#

Syntax#

Examples#

gpr - Gaussian Process Regression

Contents

gpr - Gaussian Process Regression#

Theory#

basic definitions#

joint distribution model#

predictive distribution#

log marginal likelihood#

least squares estimation#

introduction#

least squares estimation of the trend function#

least squares estimation of \(\sigma_Y^2\)#

Syntax#

Examples#

`gpr` - Gaussian Process Regression#