`trieste.models.gpflow`#

This package contains the primary interface for Gaussian process models. It also contains a number of TrainableProbabilisticModel wrappers for GPflow-based models.

Submodules#

Package Contents#

build_gpr(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace | None = None, kernel_priors: bool = True, likelihood_variance: float | None = None, trainable_likelihood: bool = False, kernel: gpflow.kernels.Kernel | None = None) → gpflow.models.GPR[source]#

Build a GPR model with sensible initial parameters and priors. By default, we use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situations, but it should not be taken as a universally good solution.

We set priors for kernel hyperparameters by default in order to stabilize model fitting. We found the priors below to be highly effective for objective functions defined over the unit hypercube. They do seem to work for other search space sizes, but we advise caution when using them in such search spaces. Using priors allows for using maximum a posteriori estimate of these kernel parameters during model fitting.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters:

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters. Required unless a kernel is passed.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by SIGNAL_NOISE_RATIO_LIKELIHOOD, where signal variance in the kernel is set to the empirical variance.
trainable_likelihood – If set to True Gaussian likelihood parameter is set to non-trainable. By default set to False.
kernel – The kernel to use in the model, defaults to letting the function set up a Matern52 kernel.

Returns:

A GPR model.

build_multifidelity_autoregressive_models(dataset: trieste.data.Dataset, num_fidelities: int, input_search_space: trieste.space.SearchSpace, likelihood_variance: float = 1e-06, kernel_priors: bool = False, trainable_likelihood: bool = False) → Sequence[trieste.models.gpflow.models.GaussianProcessRegression][source]#

Build the individual GPR models required for constructing an MultifidelityAutoregressive model with num_fidelities fidelities.

Parameters:

dataset – Dataset of points with which to initialise the individual models, where the final column of the final dimension of the query points contains the fidelity
num_fidelities – Number of fidelities desired for the MultifidelityAutoregressive model
input_search_space – The input search space of the models

Returns:

List of initialised GPR models

build_multifidelity_nonlinear_autoregressive_models(dataset: trieste.data.Dataset, num_fidelities: int, input_search_space: trieste.space.SearchSpace, kernel_base_class: Type[gpflow.kernels.Stationary] = gpflow.kernels.Matern32, kernel_priors: bool = True, trainable_likelihood: bool = False) → Sequence[trieste.models.gpflow.models.GaussianProcessRegression][source]#

Build models for training the trieste.models.gpflow.MultifidelityNonlinearAutoregressive` model

Builds a basic Matern32 kernel for the lowest fidelity, and the custom kernel described in [PRD+17] for the higher fidelities, which also have an extra input dimension. Note that the initial data that the models with fidelity greater than 0 are initialised with contain dummy data in this extra dimension, and so an update of the MultifidelityNonlinearAutoregressive is required to propagate real data through to these models.

Parameters:

dataset – The dataset to use to initialise the models
num_fidelities – The number of fidelities to model
input_search_space – the search space, used to initialise the kernel parameters
kernel_base_class – a stationary kernel type
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).

Returns:

gprs: A list containing gprs that can be used for the multifidelity model

build_sgpr(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, likelihood_variance: float | None = None, trainable_likelihood: bool = False, num_inducing_points: int | None = None, trainable_inducing_points: bool = False) → gpflow.models.SGPR[source]#

Build a SGPR model with sensible initial parameters and priors. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

For performance reasons number of inducing points should not be changed during Bayesian optimization. Hence, even if the initial dataset is smaller, we advise setting this to a higher number. By default inducing points are set to Sobol samples for the continuous search space, and simple random samples for discrete or mixed search spaces. This carries the risk that optimization gets stuck if they are not trainable, which calls for adaptive inducing point selection during the optimization. This functionality will be added to Trieste in future.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters:

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by SIGNAL_NOISE_RATIO_LIKELIHOOD, where signal variance in the kernel is set to the empirical variance.
trainable_likelihood – If set to True Gaussian likelihood parameter is set to be trainable. By default set to False.
num_inducing_points – The number of inducing points can be optionally set to a certain value. If left unspecified (default), this number is set to either NUM_INDUCING_POINTS_PER_DIM``*dimensionality of the search space or value given by ``MAX_NUM_INDUCING_POINTS, whichever is smaller.
trainable_inducing_points – If set to True inducing points will be set to be trainable. This option should be used with caution. By default set to False.

Returns:

An SGPR model.

build_svgp(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, classification: bool = False, kernel_priors: bool = True, likelihood_variance: float | None = None, trainable_likelihood: bool = False, num_inducing_points: int | None = None, trainable_inducing_points: bool = False) → gpflow.models.SVGP[source]#

Build a SVGP model with sensible initial parameters and priors. Both regression and binary classification models are available. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters:

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
classification – If a classification model is needed, this should be set to True, in which case a Bernoulli likelihood will be used. If a regression model is required, this should be set to False (default), in which case a Gaussian likelihood is used.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by SIGNAL_NOISE_RATIO_LIKELIHOOD, where signal variance in the kernel is set to the empirical variance. This argument is ignored in the classification case.
trainable_likelihood – If set to True likelihood parameter is set to be trainable. By default set to False. This argument is ignored in the classification case.
num_inducing_points – The number of inducing points can be optionally set to a certain value. If left unspecified (default), this number is set to either NUM_INDUCING_POINTS_PER_DIM``*dimensionality of the search space or value given by ``MAX_NUM_INDUCING_POINTS, whichever is smaller.
trainable_inducing_points – If set to True inducing points will be set to be trainable. This option should be used with caution. By default set to False.

Returns:

An SVGP model.

build_vgp_classifier(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, noise_free: bool = False, kernel_variance: float | None = None) → gpflow.models.VGP[source]#

Build a VGP binary classification model with sensible initial parameters and priors. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters:

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale). In the noise_free case kernel variance prior is not set.
noise_free – If there is a prior information that the classification problem is a deterministic one, this should be set to True and kernel variance will be fixed to a higher default value CLASSIFICATION_KERNEL_VARIANCE_NOISE_FREE leading to sharper classification boundary. In this case prior for the kernel variance parameter is also not set. By default set to False.
kernel_variance – Kernel variance parameter can be optionally set to a certain value. If left unspecified (default), the kernel variance is set to CLASSIFICATION_KERNEL_VARIANCE_NOISE_FREE in the noise_free case and to CLASSIFICATION_KERNEL_VARIANCE otherwise.

Returns:

A VGP model.

class ConditionalImprovementReduction(recalc_every_model_update: bool = True)[source]#

Bases: DPPInducingPointSelector

An InducingPointSelector that greedily chooses points with large predictive variance and that are likely to be in promising regions of the search space, see [MOP23].

Parameters:: recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

class ConditionalVarianceReduction(recalc_every_model_update: bool = True)[source]#

Bases: DPPInducingPointSelector

An InducingPointSelector that greedily chooses the points with maximal (conditional) predictive variance, see [BRVDW19].

Parameters:: recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

class InducingPointSelector(recalc_every_model_update: bool = True)[source]#

Bases: abc.ABC, Generic[trieste.models.interfaces.ProbabilisticModelType]

This class provides functionality to update the inducing points of an inducing point-based model as the Bayesian optimization progresses.

The only constraint on subclasses of InducingPointSelector is that they preserve the shape of the inducing points so not to trigger expensive retracing.

It can often be beneficial to change the inducing points during optimization, for example to allow the model to focus its limited modelling resources into promising areas of the space. See [VMA+21] for demonstrations of some of our InducingPointSelectors.

Parameters:: recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

calculate_inducing_points(current_inducing_points: trieste.types.TensorType, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]#

Calculate the new inducing points given the existing inducing points.

If recalc_every_model_update is set to False then we only generate new inducing points for the first calculate_inducing_points() call, otherwise we just return the current inducing points.

Parameters:

current_inducing_points – The current inducing points used by the model.
model – The sparse model.
dataset – The data from the observer.

Returns:

The new updated inducing points.

Raises:

NotImplementedError – If model has more than one set of inducing variables.

abstract _recalculate_inducing_points(M: int, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]#

Method for calculating new inducing points given a model and dataset.

This method is to be implemented by all subclasses of InducingPointSelector.

Parameters:

M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer.

Returns:

The new updated inducing points.

class KMeansInducingPointSelector(recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that chooses points as centroids of a K-means clustering of the training data.

Parameters:: recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]#

Calculate M centroids from a K-means clustering of the training data.

If the clustering returns fewer than M centroids or if we have fewer than M training data, then we fill the remaining points with random samples across the search space.

Parameters:

M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.

Returns:

The new updated inducing points.

Raises:

tf.errors.InvalidArgumentError – If dataset is empty.

class RandomSubSampleInducingPointSelector(recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that chooses points at random from the training data.

Parameters:: recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]#

Sample M points from the training data without replacement. If we require more inducing points than training data, then we fill the remaining points with random samples across the search space.

Parameters:

M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.

Returns:

The new updated inducing points.

Raises:

tf.errors.InvalidArgumentError – If dataset is empty.

class UniformInducingPointSelector(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that chooses points sampled uniformly across the search space.

Parameters:

search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]#

Sample M points. If search_space is a Box then we use a space-filling Sobol design to ensure high diversity.

Parameters:

M – Desired number of inducing points.
model – The sparse model .
dataset – The data from the observer.

Returns:

The new updated inducing points.

class GPflowPredictor(optimizer: trieste.models.optimizer.Optimizer | None = None)[source]#

Bases: trieste.models.interfaces.SupportsPredictJoint, trieste.models.interfaces.SupportsGetKernel, trieste.models.interfaces.SupportsGetObservationNoise, trieste.models.interfaces.SupportsPredictY, trieste.models.interfaces.HasReparamSampler, trieste.models.interfaces.TrainableProbabilisticModel, abc.ABC

A trainable wrapper for a GPflow Gaussian process model.

Parameters:: optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.

property optimizer: trieste.models.optimizer.Optimizer#: The optimizer with which to train the model.

abstract property model: gpflow.models.GPModel#: The underlying GPflow model.

create_posterior_cache() → None[source]#: Create a posterior cache for fast sequential predictions. Note that this must happen at initialisation and after we ensure the model data is variable. Furthermore, the cache must be updated whenever the underlying model is changed.

_ensure_variable_model_data() → None[source]#: Ensure GPflow data, which is normally stored in Tensors, is instead stored in dynamically shaped Variables. Override this as required.

update_posterior_cache() → None[source]#: Update the posterior cache. This needs to be called whenever the underlying model is changed.

predict(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Return the mean and variance of the independent marginal distributions at each point in query_points.

This is essentially a convenience method for predict_joint(), where non-event dimensions of query_points are all interpreted as broadcasting dimensions instead of batch dimensions, and the covariance is squeezed to remove redundant nesting.

Parameters:: query_points – The points at which to make predictions, of shape […, D].
Returns:: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

predict_joint(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Parameters:: query_points – The points at which to make predictions, of shape […, B, D].
Returns:: The mean and covariance of the joint marginal distribution at each batch of points in query_points. For a predictive distribution with event shape E, the mean will have shape […, B] + E, and the covariance shape […] + E + [B, B].

sample(query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType[source]#

Return num_samples samples from the independent marginal distributions at query_points.

Parameters:

query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.

Returns:

The samples. For a predictive distribution with event shape E, this has shape […, S, N] + E, where S is the number of samples.

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Parameters:: query_points – The points at which to make predictions, of shape […, D].
Returns:: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

get_kernel() → gpflow.kernels.Kernel[source]#

Return the kernel of the model.

Returns:: The kernel.

get_mean_function() → gpflow.mean_functions.MeanFunction[source]#

Return the mean function of the model.

Returns:: The mean function.

get_observation_noise() → trieste.types.TensorType[source]#

Return the variance of observation noise for homoscedastic likelihoods.

Returns:: The observation noise.
Raises:: NotImplementedError – If the model does not have a homoscedastic likelihood.

log(dataset: trieste.data.Dataset | None = None) → None[source]#

Log model training information at a given optimization step to the Tensorboard. We log kernel and likelihood parameters. We also log several training data based metrics, such as root mean square error between predictions and observations and several others.

Parameters:: dataset – Optional data that can be used to log additional data-based model summaries.

reparam_sampler(num_samples: int) → trieste.models.interfaces.ReparametrizationSampler[GPflowPredictor][source]#

Return a reparametrization sampler providing num_samples samples.

Returns:: The reparametrization sampler.

class GaussianProcessRegression(model: gpflow.models.GPR, optimizer: trieste.models.optimizer.Optimizer | None = None, num_kernel_samples: int = 10, num_rff_features: int = 1000, use_decoupled_sampler: bool = True)[source]#

Bases: trieste.models.gpflow.interface.GPflowPredictor, trieste.models.interfaces.FastUpdateModel, trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints, trieste.models.interfaces.SupportsGetInternalData, trieste.models.interfaces.HasTrajectorySampler

A TrainableProbabilisticModel wrapper for a GPflow GPR.

As Bayesian optimization requires a large number of sequential predictions (i.e. when maximizing acquisition functions), rather than calling the model directly at prediction time we instead call the posterior objects built by these models. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters:

model – The GPflow model to wrap.
optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.
num_kernel_samples – Number of randomly sampled kernels (for each kernel parameter) to evaluate before beginning model optimization. Therefore, for a kernel with p (vector-valued) parameters, we evaluate p * num_kernel_samples kernels.
num_rff_features – The number of random Fourier features used to approximate the kernel when calling trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
use_decoupled_sampler – If True use a decoupled random Fourier feature sampler, else just use a random Fourier feature sampler. The decoupled sampler suffers less from overestimating variance and can typically get away with a lower num_rff_features.

property model: gpflow.models.GPR#: The underlying GPflow model.

_ensure_variable_model_data() → None[source]#: Ensure GPflow data, which is normally stored in Tensors, is instead stored in dynamically shaped Variables. Override this as required.

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Parameters:: query_points – The points at which to make predictions, of shape […, D].
Returns:: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

update(dataset: trieste.data.Dataset) → None[source]#

Update the model given the specified dataset. Does not train the model.

Parameters:: dataset – The data with which to update the model.

covariance_between_points(query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType[source]#

Compute the posterior covariance between sets of query points.

\[\Sigma_{12} = K_{12} - K_{x1}(K_{xx} + \sigma^2 I)^{-1}K_{x2}\]

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters:

query_points_1 – Set of query points with shape […, N, D]
query_points_2 – Sets of query points with shape [M, D]

Returns:

Covariance matrix between the sets of query points with shape […, L, N, M] (L being the number of latent GPs = number of output dimensions)

optimize(dataset: trieste.data.Dataset) → trieste.models.optimizer.OptimizeResult[source]#

Optimize the model with the specified dataset.

For GaussianProcessRegression, we (optionally) try multiple randomly sampled kernel parameter configurations as well as the configuration specified when initializing the kernel. The best configuration is used as the starting point for model optimization.

For trainable parameters constrained to lie in a finite interval (through a sigmoid bijector), we begin model optimization from the best of a random sample from these parameters’ acceptable domains.

For trainable parameters without constraints but with priors, we begin model optimization from the best of a random sample from these parameters’ priors.

For trainable parameters with neither priors nor constraints, we begin optimization from their initial values.

Parameters:: dataset – The data with which to optimize the model.

find_best_model_initialization(num_kernel_samples: int) → None[source]#

Test num_kernel_samples models with sampled kernel parameters. The model’s kernel parameters are then set to the sample achieving maximal likelihood.

Parameters:: num_kernel_samples – Number of randomly sampled kernels to evaluate.

trajectory_sampler() → trieste.models.interfaces.TrajectorySampler[GaussianProcessRegression][source]#

Return a trajectory sampler. For GaussianProcessRegression, we build trajectories using a random Fourier feature approximation.

At the moment only models with single latent GP are supported.

Returns:: The trajectory sampler.
Raises:: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

get_internal_data() → trieste.data.Dataset[source]#

Return the model’s training data.

Returns:: The model’s training data.

conditional_predict_f(query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Returns the marginal GP distribution at query_points conditioned on both the model and some additional data, using exact formula. See [CGE14] (eqs. 8-10) for details.

Parameters:

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]

Returns:

mean_qp_new: predictive mean at query_points, with shape […, M, L], and var_qp_new: predictive variance at query_points, with shape […, M, L]

conditional_predict_joint(query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Predicts the joint GP distribution at query_points conditioned on both the model and some additional data, using exact formula. See [CGE14] (eqs. 8-10) for details.

Parameters:

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]

Returns:

mean_qp_new: predictive mean at query_points, with shape […, M, L], and cov_qp_new: predictive covariance between query_points, with shape […, L, M, M]

conditional_predict_f_sample(query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset, num_samples: int) → trieste.types.TensorType[source]#

Generates samples of the GP at query_points conditioned on both the model and some additional data.

Parameters:

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]
num_samples – number of samples

Returns:

samples of f at query points, with shape […, num_samples, M, L]

conditional_predict_y(query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Generates samples of y from the GP at query_points conditioned on both the model and some additional data.

Parameters:

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]

Returns:

predictive variance at query_points, with shape […, M, L], and predictive variance at query_points, with shape […, M, L]

class MultifidelityAutoregressive(fidelity_models: Sequence[GaussianProcessRegression])[source]#

Bases: trieste.models.interfaces.TrainableProbabilisticModel, trieste.models.interfaces.SupportsPredictY, trieste.models.interfaces.SupportsCovarianceWithTopFidelity

A TrainableProbabilisticModel implementation of the model from [KOHagan00]. This is a multi-fidelity model that works with an arbitrary number of fidelities. It relies on there being a linear relationship between fidelities, and may not perform well for more complex relationships. Precisely, it models the relationship between sequential fidelities as

\[f_{i}(x) = \rho f_{i-1}(x) + \delta(x)\]

where \(\rho\) is a scalar and \(\delta\) models the residual between the fidelities. The only base models supported in this implementation are GPR models. Note: Currently only supports single output problems.

Parameters:: fidelity_models – List of GaussianProcessRegression models, one for each fidelity. The model at index 0 will be used as the signal model for the lowest fidelity and models at higher indices will be used as the residual model for each higher fidelity.

property num_fidelities: int#: The number of fidelities

predict(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Predict the marginal mean and variance at query_points.

Parameters:: query_points – Query points with shape [N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: mean: The mean at query_points with shape [N, P], and var: The variance at query_points with shape [N, P]

_calculate_residual(dataset: trieste.data.Dataset, fidelity: int) → trieste.types.TensorType[source]#

Calculate the true residuals for a set of datapoints at a given fidelity.

Dataset should be made up of points that you have observations for at fidelity fidelity. The residuals calculated here are the difference between the data and the prediction at the lower fidelity multiplied by the rho value at this fidelity. This produces the training data for the residual models.

\[r_{i} = y - \rho_{i} * f_{i-1}(x)\]

Parameters:

dataset – Dataset of points for which to calculate the residuals. Must have observations at fidelity fidelity. Query points shape is [N, D], observations is [N,P].
fidelity – The fidelity for which to calculate the residuals

Returns:

The true residuals at given datapoints for given fidelity, shape is [N,1].

sample(query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType[source]#

Sample num_samples samples from the posterior distribution at query_points

Parameters:

query_points – The query points at which to sample of shape [N, D+1], where the final column of the final dimension contains the fidelity of the query point
num_samples – The number of samples (S) to generate for each query point.

Returns:

samples from the posterior of shape […, S, N, P]

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Predict the marginal mean and variance at query_points including observation noise

Parameters:: query_points – Query points with shape […, N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: mean: The mean at query_points with shape [N, P], and var: The variance at query_points with shape [N, P]

update(dataset: trieste.data.Dataset) → None[source]#

Update the models on their corresponding data. The data for each model is extracted by splitting the observations in dataset by fidelity level.

Parameters:: dataset – The query points and observations for all the wrapped models.

optimize(dataset: trieste.data.Dataset) → None[source]#

Optimize all the models on their corresponding data. The data for each model is extracted by splitting the observations in dataset by fidelity level. Note that we have to code up a custom loss function when optimizing our residual model, so that we can include the correlation parameter as an optimisation variable.

Parameters:: dataset – The query points and observations for all the wrapped models.

covariance_with_top_fidelity(query_points: trieste.types.TensorType) → trieste.types.TensorType[source]#

Calculate the covariance of the output at query_point and a given fidelity with the highest fidelity output at the same query_point.

Parameters:: query_points – The query points to calculate the covariance for, of shape [N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: The covariance with the top fidelity for the query_points, of shape [N, P]

log(dataset: trieste.data.Dataset | None = None) → None[source]#

Log model-specific information at a given optimization step.

Parameters:: dataset – Optional data that can be used to log additional data-based model summaries.

class MultifidelityNonlinearAutoregressive(fidelity_models: Sequence[GaussianProcessRegression], num_monte_carlo_samples: int = 100)[source]#

Bases: trieste.models.interfaces.TrainableProbabilisticModel, trieste.models.interfaces.SupportsPredictY, trieste.models.interfaces.SupportsCovarianceWithTopFidelity

A TrainableProbabilisticModel implementation of the model from [PRD+17]. This is a multifidelity model that works with an arbitrary number of fidelities. It is capable of modelling both linear and non-linear relationships between fidelities. It models the relationship between sequential fidelities as

\[f_{i}(x) = g_{i}(x, f_{*i-1}(x))\]

where \(f{*i-1}\) is the posterior of the previous fidelity. The only base models supported in this implementation are GPR models. Note: Currently only supports single output problems.

Parameters:

fidelity_models – List of GaussianProcessRegression models, one for each fidelity. The model at index 0 should take inputs with the same number of dimensions as x and can use any kernel, whilst the later models should take an extra input dimesion, and use the kernel described in [PRD+17].
num_monte_carlo_samples – The number of Monte Carlo samples to use for the sections of prediction and sampling that require the use of Monte Carlo methods.

property num_fidelities: int#: The number of fidelities

sample(query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType[source]#

Return num_samples samples from the independent marginal distributions at query_points.

Parameters:

query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.

Returns:

The samples, with shape […, S, N], where S is the number of samples.

predict(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Predict the marginal mean and variance at query_points.

Parameters:: query_points – Query points with shape […, N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: mean: The mean at query_points with shape […, N, P], and var: The variance at query_points with shape […, N, P]

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Predict the marginal mean and variance at query_points including observation noise

Parameters:: query_points – Query points with shape […, N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: mean: The mean at query_points with shape [N, P], and var: The variance at query_points with shape [N, P]

_sample_mean_and_var_at_fidelities(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Draw num_monte_carlo_samples samples of mean and variance from the model at the fidelities passed in the final column of the query points.

Parameters:: query_points – Query points with shape […, N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: sample_mean: Samples of the mean at the query points with shape […, N, 1, S] and sample_var: Samples of the variance at the query points with shape […, N, 1, S]

_propagate_samples_through_level(query_point: trieste.types.TensorType, fidelity: int, sample_mean: trieste.types.TensorType, sample_var: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Propagate samples through a given fidelity.

This takes a set of query points without a fidelity column and calculates samples at the given fidelity, using the sample means and variances from the previous fidelity.

Parameters:

query_points – The query points to sample at, with no fidelity column, with shape[…, N, D]
fidelity – The fidelity to propagate the samples through
sample_mean – Samples of the posterior mean at the previous fidelity, with shape […, N, 1, S]
sample_var – Samples of the posterior variance at the previous fidelity, with shape […, N, 1, S]

Returns:

sample_mean: Samples of the posterior mean at the given fidelity, of shape […, N, 1, S] and sample_var: Samples of the posterior variance at the given fidelity, of shape […, N, 1, S]

update(dataset: trieste.data.Dataset) → None[source]#

Update the models on their corresponding data. The data for each model is extracted by splitting the observations in dataset by fidelity level.

Parameters:: dataset – The query points and observations for all the wrapped models.

optimize(dataset: trieste.data.Dataset) → None[source]#

Optimize all the models on their corresponding data. The data for each model is extracted by splitting the observations in dataset by fidelity level.

Parameters:: dataset – The query points and observations for all the wrapped models.

covariance_with_top_fidelity(query_points: trieste.types.TensorType) → trieste.types.TensorType[source]#

Calculate the covariance of the output at query_point and a given fidelity with the highest fidelity output at the same query_point.

Parameters:: query_points – The query points to calculate the covariance for, of shape [N, D+1], where the final column of the final dimension contains the fidelity of the query point
Returns:: The covariance with the top fidelity for the query_points, of shape [N, P]

log(dataset: trieste.data.Dataset | None = None) → None[source]#

Log model-specific information at a given optimization step.

Parameters:: dataset – Optional data that can be used to log additional data-based model summaries.

class SparseGaussianProcessRegression(model: gpflow.models.SGPR, optimizer: trieste.models.optimizer.Optimizer | None = None, num_rff_features: int = 1000, inducing_point_selector: trieste.models.gpflow.inducing_point_selectors.InducingPointSelector[SparseGaussianProcessRegression] | None = None)[source]#

Bases: trieste.models.gpflow.interface.GPflowPredictor, trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints, trieste.models.interfaces.SupportsGetInducingVariables, trieste.models.interfaces.SupportsGetInternalData, trieste.models.interfaces.HasTrajectorySampler

A TrainableProbabilisticModel wrapper for a GPflow SGPR. At the moment we only support models with a single latent GP. This is due to compute_qu method in SGPR that is used for computing covariance between query points and trajectory sampling, which at the moment works only for single latent GP.

Similarly to our GaussianProcessRegression, our SGPR wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters:

model – The GPflow model to wrap.
optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.
num_rff_features – The number of random Fourier features used to approximate the kernel when calling trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
inducing_point_selector – The (optional) desired inducing point selector that will update the underlying GPflow SGPR model’s inducing points as the optimization progresses.

Raises:

NotImplementedError (or ValueError) – If we try to use a model with invalid num_rff_features, or an inducing_point_selector with a model that has more than one set of inducing points.

property model: gpflow.models.SGPR#: The underlying GPflow model.

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Parameters:: query_points – The points at which to make predictions, of shape […, D].
Returns:: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

_ensure_variable_model_data() → None[source]#: Ensure GPflow data, which is normally stored in Tensors, is instead stored in dynamically shaped Variables. Override this as required.

optimize(dataset: trieste.data.Dataset) → trieste.models.optimizer.OptimizeResult[source]#

Optimize the model with the specified dataset.

Parameters:: dataset – The data with which to optimize the model.

update(dataset: trieste.data.Dataset) → None[source]#

Update the model given the specified dataset. Does not train the model.

Parameters:: dataset – The data with which to update the model.

_update_inducing_variables(new_inducing_points: trieste.types.TensorType) → None[source]#

When updating the inducing points of a model, we must also update the other inducing variables, i.e. q_mu and q_sqrt accordingly. The exact form of this update depends if we are using whitened representations of the inducing variables. See _whiten_points() for details.

Parameters:: new_inducing_points – The desired values for the new inducing points.
Raises:: NotImplementedError – If we try to update the inducing variables of a model that has more than one set of inducing points.

get_inducing_variables() → Tuple[trieste.types.TensorType | list[trieste.types.TensorType], trieste.types.TensorType, trieste.types.TensorType, bool][source]#

Return the model’s inducing variables. The SGPR model does not have q_mu, q_sqrt and whiten objects. We can use compute_qu method to obtain q_mu and q_sqrt, while the SGPR model does not use the whitened representation. Note that at the moment compute_qu works only for single latent GP and returns q_sqrt in a shape that is inconsistent with the SVGP model (hence we need to do modify its shape).

Returns:: The inducing points (i.e. locations of the inducing variables), as a Tensor or a list of Tensors (when the model has multiple inducing points); a tensor containing the variational mean q_mu; a tensor containing the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.
Raises:: NotImplementedError – If the model has more than one latent GP.

covariance_between_points(query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType[source]#

Compute the posterior covariance between sets of query points.

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters:

query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]

Returns:

Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)

trajectory_sampler() → trieste.models.interfaces.TrajectorySampler[SparseGaussianProcessRegression][source]#

Return a trajectory sampler. For SparseGaussianProcessRegression, we build trajectories using a decoupled random Fourier feature approximation. Note that this is available only for single output models.

At the moment only models with single latent GP are supported.

Returns:: The trajectory sampler.
Raises:: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

get_internal_data() → trieste.data.Dataset[source]#

Return the model’s training data.

Returns:: The model’s training data.

class SparseVariational(model: gpflow.models.SVGP, optimizer: trieste.models.optimizer.Optimizer | None = None, num_rff_features: int = 1000, inducing_point_selector: trieste.models.gpflow.inducing_point_selectors.InducingPointSelector[SparseVariational] | None = None)[source]#

A TrainableProbabilisticModel wrapper for a GPflow SVGP.

Similarly to our GaussianProcessRegression, our SVGP wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters:

model – The underlying GPflow sparse variational model.
optimizer – The optimizer with which to train the model. Defaults to BatchOptimizer with Adam with batch size 100.
num_rff_features – The number of random Fourier features used to approximate the kernel when performing decoupled Thompson sampling through its trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
inducing_point_selector – The (optional) desired inducing_point_selector that will update the underlying GPflow sparse variational model’s inducing points as the optimization progresses.

Raises:

NotImplementedError – If we try to use an inducing_point_selector with a model that has more than one set of inducing points.

property model: gpflow.models.SVGP#: The underlying GPflow model.

_ensure_variable_model_data() → None[source]#: Ensure GPflow data, which is normally stored in Tensors, is instead stored in dynamically shaped Variables. Override this as required.

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Parameters:: query_points – The points at which to make predictions, of shape […, D].
Returns:: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

update(dataset: trieste.data.Dataset) → None[source]#

Update the model given the specified dataset. Does not train the model.

Parameters:: dataset – The data with which to update the model.

optimize(dataset: trieste.data.Dataset) → trieste.models.optimizer.OptimizeResult[source]#

Optimize the model with the specified dataset.

Parameters:: dataset – The data with which to optimize the model.

_update_inducing_variables(new_inducing_points: trieste.types.TensorType) → None[source]#

Parameters:: new_inducing_points – The desired values for the new inducing points.
Raises:: NotImplementedError – If we try to update the inducing variables of a model that has more than one set of inducing points.

get_inducing_variables() → Tuple[trieste.types.TensorType | list[trieste.types.TensorType], trieste.types.TensorType, trieste.types.TensorType, bool][source]#

Return the model’s inducing variables.

Returns:: The inducing points (i.e. locations of the inducing variables), as a Tensor or a list of Tensors (when the model has multiple inducing points); A tensor containing the variational mean q_mu; a tensor containing the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.

covariance_between_points(query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType[source]#

Compute the posterior covariance between sets of query points.

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters:

query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]

Returns:

Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)

trajectory_sampler() → trieste.models.interfaces.TrajectorySampler[SparseVariational][source]#

Return a trajectory sampler. For SparseVariational, we build trajectories using a decoupled random Fourier feature approximation.

Returns:: The trajectory sampler.

class VariationalGaussianProcess(model: gpflow.models.VGP, optimizer: trieste.models.optimizer.Optimizer | None = None, use_natgrads: bool = False, natgrad_gamma: float | None = None, num_rff_features: int = 1000)[source]#

A TrainableProbabilisticModel wrapper for a GPflow VGP.

A Variational Gaussian Process (VGP) approximates the posterior of a GP using the multivariate Gaussian closest to the posterior of the GP by minimizing the KL divergence between approximated and exact posteriors. See [OA09] for details.

The VGP provides (approximate) GP modelling under non-Gaussian likelihoods, for example when fitting a classification model over binary data.

A whitened representation and (optional) natural gradient steps are used to aid model optimization.

Similarly to our GaussianProcessRegression, our VGP wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters:

model – The GPflow VGP.
optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.
use_natgrads – If True then alternate model optimization steps with natural gradient updates. Note that natural gradients requires a BatchOptimizer wrapper with Optimizer optimizer.
num_rff_features – The number of random Fourier features used to approximate the kernel when performing decoupled Thompson sampling through its trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Natgrad_gamma:

Gamma parameter for the natural gradient optimizer.

Raises:

ValueError (or InvalidArgumentError) – If model’s q_sqrt is not rank 3 or if attempting to combine natural gradients with a Scipy optimizer.

property model: gpflow.models.VGP#: The underlying GPflow model.

_ensure_variable_model_data() → None[source]#: Ensure GPflow data, which is normally stored in Tensors, is instead stored in dynamically shaped Variables. Override this as required.

predict_y(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType][source]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Parameters:: query_points – The points at which to make predictions, of shape […, D].
Returns:: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

update(dataset: trieste.data.Dataset, *, jitter: float = DEFAULTS.JITTER) → None[source]#

Update the model given the specified dataset. Does not train the model.

Parameters:

dataset – The data with which to update the model.
jitter – The size of the jitter to use when stabilizing the Cholesky decomposition of the covariance matrix.

optimize(dataset: trieste.data.Dataset) → trieste.models.optimizer.OptimizeResult | None[source]#

VariationalGaussianProcess has a custom optimize method that (optionally) permits alternating between standard optimization steps (for kernel parameters) and natural gradient steps for the variational parameters (q_mu and q_sqrt). See [SEH18] for details. Using natural gradients can dramatically speed up model fitting, especially for ill-conditioned posteriors.

If using natural gradients, our optimizer inherits the mini-batch behavior and number of optimization steps as the base optimizer specified when initializing the VariationalGaussianProcess.

get_inducing_variables() → Tuple[trieste.types.TensorType, trieste.types.TensorType, trieste.types.TensorType, bool][source]#

Return the model’s inducing variables. Note that GPflow’s VGP model is hard-coded to use the whitened representation.

Returns:: Tensors containing: the inducing points (i.e. locations of the inducing variables); the variational mean q_mu; the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.

trajectory_sampler() → trieste.models.interfaces.TrajectorySampler[VariationalGaussianProcess][source]#

Return a trajectory sampler. For VariationalGaussianProcess, we build trajectories using a decoupled random Fourier feature approximation.

At the moment only models with single latent GP are supported.

Returns:: The trajectory sampler.
Raises:: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

covariance_between_points(query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType[source]#

Compute the posterior covariance between sets of query points.

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters:

query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]

Returns:

Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)

class BatchReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.SupportsPredictJoint, qmc: bool = False, qmc_skip: bool = True)[source]#

Bases: trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.SupportsPredictJoint]

This sampler employs the reparameterization trick to approximate batches of samples from a ProbabilisticModel‘s predictive joint distribution as

\[x \mapsto \mu(x) + \epsilon L(x)\]

where \(L\) is the Cholesky factor s.t. \(LL^T\) is the covariance, and \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.

Parameters:

sample_size – The number of samples for each batch of points. Must be positive.
model – The model to sample from.
qmc – Whether to use QMC sobol sampling instead of random normal sampling. QMC sampling more accurately approximates a normal distribution than truly random samples.
qmc_skip – Whether to use the skip parameter to ensure the QMC sampler gives different samples whenever it is reset. This is not supported with XLA.

Raises:

ValueError (or InvalidArgumentError) – If sample_size is not positive.

skip: trieste.types.TensorType#: Number of sobol sequence points to skip. This is incremented for each sampler.

sample(at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType[source]#

Return approximate samples from the model specified at __init__(). Multiple calls to sample(), for any given BatchReparametrizationSampler and at, will produce the exact same samples. Calls to sample() on different BatchReparametrizationSampler instances will produce different samples.

Parameters:

at – Batches of query points at which to sample the predictive distribution, with shape […, B, D], for batches of size B of points of dimension D. Must have a consistent batch size across all calls to sample() for any given BatchReparametrizationSampler.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

Returns:

The samples, of shape […, S, B, L], where S is the sample_size, B the number of points per batch, and L the dimension of the model’s predictive distribution.

Raises:

ValueError (or InvalidArgumentError) – If any of the following are true: - at is a scalar. - The batch size B of at is not positive. - The batch size B of at differs from that of previous calls. - jitter is negative.

class DecoupledTrajectorySampler(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#

Bases: FeatureDecompositionTrajectorySampler[Union[FeatureDecompositionInducingPointModel, FeatureDecompositionInternalDataModel]]

This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model using decoupled sampling. See [WBT+20] for an introduction to decoupled sampling.

Unlike our RandomFourierFeatureTrajectorySampler which uses a RFF decomposition to aprroximate the Gaussian process posterior, a DecoupledTrajectorySampler only uses an RFF decomposition to approximate the Gausian process prior and instead using a canonical decomposition to discretize the effect of updating the prior on the given data.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(.) = \sum_{i=1}^L w_i\phi_i(.) + \sum_{j=1}^m v_jk(.,z_j)\]

where \(\phi_i(.)\) and \(w_i\) are the Fourier features and their weights that discretize the prior. In contrast, k(.,z_j) and \(v_i\) are the canonical features and their weights that discretize the data update.

The expression for \(v_i\) depends on if we are using an exact Gaussian process or a sparse approximations. See eq. (13) in [WBT+20] for details.

Note that if a model is both of FeatureDecompositionInducingPointModel type and FeatureDecompositionInternalDataModel type, FeatureDecompositionInducingPointModel will take a priority and inducing points will be used for computations rather than data.

Parameters:

model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Raises:

NotImplementedError – If the model is not of valid type.

_prepare_weight_sampler() → Callable[[int], trieste.types.TensorType][source]#: Prepare the sampler function that provides samples of the feature weights for both the RFF and canonical feature functions, i.e. we return a function that takes in a batch size B and returns B samples for the weights of each of the F RFF features and M canonical features for L outputs.

class IndependentReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.ProbabilisticModel, qmc: bool = False, qmc_skip: bool = True)[source]#

Bases: trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.ProbabilisticModel]

This sampler employs the reparameterization trick to approximate samples from a ProbabilisticModel‘s predictive distribution as

\[x \mapsto \mu(x) + \epsilon \sigma(x)\]

where \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.

Parameters:

sample_size – The number of samples to take at each point. Must be positive.
model – The model to sample from.
qmc – Whether to use QMC sobol sampling instead of random normal sampling. QMC sampling more accurately approximates a normal distribution than truly random samples.
qmc_skip – Whether to use the skip parameter to ensure the QMC sampler gives different samples whenever it is reset. This is not supported with XLA.

Raises:

ValueError (or InvalidArgumentError) – If sample_size is not positive.

skip: trieste.types.TensorType#: Number of sobol sequence points to skip. This is incremented for each sampler.

sample(at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType[source]#

Return approximate samples from the model specified at __init__(). Multiple calls to sample(), for any given IndependentReparametrizationSampler and at, will produce the exact same samples. Calls to sample() on different IndependentReparametrizationSampler instances will produce different samples.

Parameters:

at – Where to sample the predictive distribution, with shape […, 1, D], for points of dimension D.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

Returns:

The samples, of shape […, S, 1, L], where S is the sample_size and L is the number of latent model dimensions.

Raises:

ValueError (or InvalidArgumentError) – If at has an invalid shape or jitter is negative.

class RandomFourierFeatureTrajectorySampler(model: FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#

Bases: FeatureDecompositionTrajectorySampler[FeatureDecompositionInternalDataModel]

This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model. For tractibility, the Gaussian process is approximated with a Bayesian Linear model across a set of features sampled from the Fourier feature decomposition of the model’s kernel. See [HernandezLHG14] for details. Currently we do not support models with multiple latent Gaussian processes.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m Fourier features and \(\theta_i\) are feature weights sampled from a posterior distribution that depends on the feature values at the model’s datapoints.

Our implementation follows [HernandezLHG14], with our calculations differing slightly depending on properties of the problem. In particular, we used different calculation strategies depending on the number of considered features m and the number of data points n.

If \(m<n\) then we follow Appendix A of [HernandezLHG14] and calculate the posterior distribution for \(\theta\) following their Bayesian linear regression motivation, i.e. the computation revolves around an O(m^3) inversion of a design matrix.

If \(n<m\) then we use the kernel trick to recast computation to revolve around an O(n^3) inversion of a gram matrix. As well as being more efficient in early BO steps (where \(n<m\)), this second computation method allows much larger choices of m (as required to approximate very flexible kernels).

Parameters:

model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Raises:

ValueError – If dataset is empty.

_prepare_weight_sampler() → Callable[[int], trieste.types.TensorType][source]#: Calculate the posterior of theta (the feature weights) for the RFFs, returning a function that takes in a batch size B and returns B samples for the weights of each of the RFF F features for one output.

_prepare_theta_posterior_in_design_space() → tensorflow_probability.distributions.MultivariateNormalTriL[source]#

Calculate the posterior of theta (the feature weights) in the design space. This distribution is a Gaussian

\[\theta \sim N(D^{-1}\Phi^Ty,D^{-1}\sigma^2)\]

where the [m,m] design matrix \(D=(\Phi^T\Phi + \sigma^2I_m)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).

_prepare_theta_posterior_in_gram_space() → tensorflow_probability.distributions.MultivariateNormalTriL[source]#

Calculate the posterior of theta (the feature weights) in the gram space.

\[\theta \sim N(\Phi^TG^{-1}y,I_m - \Phi^TG^{-1}\Phi)\]

where the [n,n] gram matrix \(G=(\Phi\Phi^T + \sigma^2I_n)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).

class feature_decomposition_trajectory(feature_functions: Callable[[trieste.types.TensorType], trieste.types.TensorType], weight_sampler: Callable[[int], trieste.types.TensorType], mean_function: Callable[[trieste.types.TensorType], trieste.types.TensorType])[source]#

Bases: trieste.models.interfaces.TrajectoryFunctionClass

An approximate sample from a Gaussian processes’ posterior samples represented as a finite weighted sum of features.

A trajectory is given by

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m feature functions and \(\theta_i\) are feature weights sampled from a posterior distribution.

The number of trajectories (i.e. batch size) is determined from the first call of the trajectory. In order to change the batch size, a new TrajectoryFunction must be built.

Parameters:

feature_functions – Set of feature function.
weight_sampler – New sampler that generates feature weight samples.
mean_function – The underlying model’s mean function.

__call__(inputs: trieste.types.TensorType) → trieste.types.TensorType[source]#: Call trajectory function.

resample() → None[source]#: Efficiently resample in-place without retracing.

update(weight_sampler: Callable[[int], trieste.types.TensorType]) → None[source]#

Efficiently update the trajectory with a new weight distribution and resample its weights.

Parameters:: weight_sampler – New sampler that generates feature weight samples.

assert_data_is_compatible(new_data: trieste.data.Dataset, existing_data: trieste.data.Dataset) → None[source]#

Checks that new data is compatible with existing data.

Parameters:

new_data – New data.
existing_data – Existing data.

Raises:

ValueError – if trailing dimensions of the query point or observation differ.

check_optimizer(optimizer: trieste.models.optimizer.BatchOptimizer | trieste.models.optimizer.Optimizer) → None[source]#

Check that the optimizer for the GPflow models is using a correct optimizer wrapper.

Stochastic gradient descent based methods implemented in TensorFlow would not work properly without mini-batches and hence BatchOptimizer that prepares mini-batches and calls the optimizer iteratively needs to be used. GPflow’s Scipy optimizer on the other hand should use the non-batch wrapper Optimizer.

Parameters:: optimizer – An instance of the optimizer wrapper with the underlying optimizer.
Raises:: ValueError – If Optimizer is not using BatchOptimizer or Scipy is using BatchOptimizer.

randomize_hyperparameters(object: gpflow.Module) → None[source]#

Sets hyperparameters to random samples from their prior distributions or (for Sigmoid constraints with no priors) their constrained domains. Note that it is up to the caller to ensure that the prior, if defined, is compatible with the transform.

Parameters:: object – Any gpflow Module.

squeeze_hyperparameters(object: gpflow.Module, alpha: float = 0.01, epsilon: float = 1e-07) → None[source]#

Squeezes the parameters to be strictly inside their range defined by the Sigmoid, or strictly greater than the limit defined by the Shift+Softplus. This avoids having Inf unconstrained values when the parameters are exactly at the boundary.

Parameters:

object – Any gpflow Module.
alpha – the proportion of the range with which to squeeze for the Sigmoid case
epsilon – the value with which to offset the shift for the Softplus case.

Raises:

ValueError – If alpha is not in (0,1) or epsilon <= 0

trieste.models.gpflow#

Submodules#

Package Contents#

`trieste.models.gpflow`#