`trieste.models.gpflow`#

This package contains the primary interface for Gaussian process models. It also contains a number of TrainableProbabilisticModel wrappers for GPflow-based models.

Submodules#

Package Contents#

build_gpr(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, likelihood_variance: Optional[float] = None, trainable_likelihood: bool = False) → gpflow.models.GPR[source]#

Build a GPR model with sensible initial parameters and priors. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

We set priors for kernel hyperparameters by default in order to stabilize model fitting. We found the priors below to be highly effective for objective functions defined over the unit hypercube. They do seem to work for other search space sizes, but we advise caution when using them in such search spaces. Using priors allows for using maximum a posteriori estimate of these kernel parameters during model fitting.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by SIGNAL_NOISE_RATIO_LIKELIHOOD, where signal variance in the kernel is set to the empirical variance.
trainable_likelihood – If set to True Gaussian likelihood parameter is set to non-trainable. By default set to False.

Returns

A GPR model.

build_sgpr(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, likelihood_variance: Optional[float] = None, trainable_likelihood: bool = False, num_inducing_points: Optional[int] = None, trainable_inducing_points: bool = False) → gpflow.models.SGPR[source]#

Build a SGPR model with sensible initial parameters and priors. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

For performance reasons number of inducing points should not be changed during Bayesian optimization. Hence, even if the initial dataset is smaller, we advise setting this to a higher number. By default inducing points are set to Sobol samples for the continuous search space, and simple random samples for discrete or mixed search spaces. This carries the risk that optimization gets stuck if they are not trainable, which calls for adaptive inducing point selection during the optimization. This functionality will be added to Trieste in future.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by SIGNAL_NOISE_RATIO_LIKELIHOOD, where signal variance in the kernel is set to the empirical variance.
trainable_likelihood – If set to True Gaussian likelihood parameter is set to be trainable. By default set to False.
num_inducing_points – The number of inducing points can be optionally set to a certain value. If left unspecified (default), this number is set to either NUM_INDUCING_POINTS_PER_DIM``*dimensionality of the search space or value given by ``MAX_NUM_INDUCING_POINTS, whichever is smaller.
trainable_inducing_points – If set to True inducing points will be set to be trainable. This option should be used with caution. By default set to False.

Returns

An SGPR model.

build_svgp(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, classification: bool = False, kernel_priors: bool = True, likelihood_variance: Optional[float] = None, trainable_likelihood: bool = False, num_inducing_points: Optional[int] = None, trainable_inducing_points: bool = False) → gpflow.models.SVGP[source]#

Build a SVGP model with sensible initial parameters and priors. Both regression and binary classification models are available. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
classification – If a classification model is needed, this should be set to True, in which case a Bernoulli likelihood will be used. If a regression model is required, this should be set to False (default), in which case a Gaussian likelihood is used.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by SIGNAL_NOISE_RATIO_LIKELIHOOD, where signal variance in the kernel is set to the empirical variance. This argument is ignored in the classification case.
trainable_likelihood – If set to True likelihood parameter is set to be trainable. By default set to False. This argument is ignored in the classification case.
num_inducing_points – The number of inducing points can be optionally set to a certain value. If left unspecified (default), this number is set to either NUM_INDUCING_POINTS_PER_DIM``*dimensionality of the search space or value given by ``MAX_NUM_INDUCING_POINTS, whichever is smaller.
trainable_inducing_points – If set to True inducing points will be set to be trainable. This option should be used with caution. By default set to False.

Returns

An SVGP model.

build_vgp_classifier(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, noise_free: bool = False, kernel_variance: Optional[float] = None) → gpflow.models.VGP[source]#

Build a VGP binary classification model with sensible initial parameters and priors. We use Matern52 kernel and Constant mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.

Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.

Parameters

data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale). In the noise_free case kernel variance prior is not set.
noise_free – If there is a prior information that the classification problem is a deterministic one, this should be set to True and kernel variance will be fixed to a higher default value CLASSIFICATION_KERNEL_VARIANCE_NOISE_FREE leading to sharper classification boundary. In this case prior for the kernel variance parameter is also not set. By default set to False.
kernel_variance – Kernel variance parameter can be optionally set to a certain value. If left unspecified (default), the kernel variance is set to CLASSIFICATION_KERNEL_VARIANCE_NOISE_FREE in the noise_free case and to CLASSIFICATION_KERNEL_VARIANCE otherwise.

Returns

A VGP model.

class InducingPointSelector(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]#

Bases: abc.ABC, Generic[trieste.models.interfaces.ProbabilisticModelType]

This class provides functionality to update the inducing points of an inducing point-based model as the Bayesian optimization progresses.

The only constraint on subclasses of InducingPointSelector is that they preserve the shape of the inducing points so not to trigger expensive retracing.

It can often be beneficial to change the inducing points during optimization, for example to allow the model to focus its limited modelling resources into promising areas of the space. See [VMA+21] for demonstrations of some of our InducingPointSelectors.

Parameters

search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

calculate_inducing_points(self, current_inducing_points: trieste.types.TensorType, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType #

Calculate the new inducing points given the existing inducing points.

If recalc_every_model_update is set to False then we only generate new inducing points for the first calculate_inducing_points() call, otherwise we just return the current inducing points.

Parameters

current_inducing_points – The current inducing points used by the model.
model – The sparse model.
dataset – The data from the observer.

Returns

The new updated inducing points.

Raises

NotImplementedError – If model has more than one set of inducing variables.

abstract _recalculate_inducing_points(self, M: int, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType #

Method for calculating new inducing points given a model and dataset.

This method is to be implemented by all subclasses of InducingPointSelector.

Parameters

M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer.

Returns

The new updated inducing points.

class KMeansInducingPointSelector(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.interfaces.ProbabilisticModel]

An InducingPointSelector that chooses points as centroids of a K-means clustering of the training data.

Parameters

search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(self, M: int, model: trieste.models.interfaces.ProbabilisticModel, dataset: trieste.data.Dataset) → trieste.types.TensorType #

Calculate M centroids from a K-means clustering of the training data.

If the clustering returns fewer than M centroids or if we have fewer than M training data, then we fill the remaining points with random samples across the search space.

Parameters

M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.

Returns

The new updated inducing points.

Raises

tf.errors.InvalidArgumentError – If dataset is empty.

class RandomSubSampleInducingPointSelector(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.interfaces.ProbabilisticModel]

An InducingPointSelector that chooses points at random from the training data.

Parameters

search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(self, M: int, model: trieste.models.interfaces.ProbabilisticModel, dataset: trieste.data.Dataset) → trieste.types.TensorType #

Sample M points from the training data without replacement. If we require more inducing points than training data, then we fill the remaining points with random samples across the search space.

Parameters

M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.

Returns

The new updated inducing points.

Raises

tf.errors.InvalidArgumentError – If dataset is empty.

class UniformInducingPointSelector(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.interfaces.ProbabilisticModel]

An InducingPointSelector that chooses points sampled uniformly across the search space.

Parameters

search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(self, M: int, model: trieste.models.interfaces.ProbabilisticModel, dataset: trieste.data.Dataset) → trieste.types.TensorType #

Sample M points. If search_space is a Box then we use a space-filling Sobol design to ensure high diversity.

Parameters

M – Desired number of inducing points.
model – The sparse model .
dataset – The data from the observer.

Returns

The new updated inducing points.

class GPflowPredictor(optimizer: Optimizer | None = None)[source]#

Bases: trieste.models.interfaces.SupportsPredictJoint, trieste.models.interfaces.SupportsGetKernel, trieste.models.interfaces.SupportsGetObservationNoise, trieste.models.interfaces.HasReparamSampler, abc.ABC

A trainable wrapper for a GPflow Gaussian process model.

Parameters: optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.

property optimizer(self) → trieste.models.optimizer.Optimizer #: The optimizer with which to train the model.

create_posterior_cache(self) → None #: Create a posterior cache for fast sequential predictions. Note that this must happen at initialisation and after we ensure the model data is variable. Furthermore, the cache must be updated whenever the underlying model is changed.

update_posterior_cache(self) → None #: Update the posterior cache. This needs to be called whenever the underlying model is changed.

property model(self) → gpflow.models.GPModel#: The underlying GPflow model.

predict(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Return the mean and variance of the independent marginal distributions at each point in query_points.

This is essentially a convenience method for predict_joint(), where non-event dimensions of query_points are all interpreted as broadcasting dimensions instead of batch dimensions, and the covariance is squeezed to remove redundant nesting.

Parameters: query_points – The points at which to make predictions, of shape […, D].
Returns: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

predict_joint(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Parameters: query_points – The points at which to make predictions, of shape […, B, D].
Returns: The mean and covariance of the joint marginal distribution at each batch of points in query_points. For a predictive distribution with event shape E, the mean will have shape […, B] + E, and the covariance shape […] + E + [B, B].

sample(self, query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType #

Return num_samples samples from the independent marginal distributions at query_points.

Parameters

query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.

Returns

The samples. For a predictive distribution with event shape E, this has shape […, S, N] + E, where S is the number of samples.

predict_y(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Note that this is not supported by all models.

Parameters: query_points – The points at which to make predictions, of shape […, D].
Returns: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

get_kernel(self) → gpflow.kernels.Kernel#

Return the kernel of the model.

Returns: The kernel.

get_mean_function(self) → gpflow.mean_functions.MeanFunction#

Return the mean function of the model.

Returns: The mean function.

get_observation_noise(self) → trieste.types.TensorType #

Return the variance of observation noise for homoscedastic likelihoods.

Returns: The observation noise.
Raises: NotImplementedError – If the model does not have a homoscedastic likelihood.

optimize(self, dataset: trieste.data.Dataset) → None #

Optimize the model with the specified dataset.

Parameters: dataset – The data with which to optimize the model.

log(self, dataset: Optional[trieste.data.Dataset] = None) → None #

Log model-specific information at a given optimization step.

Parameters: dataset – Optional data that can be used to log additional data-based model summaries.

reparam_sampler(self, num_samples: int) → trieste.models.interfaces.ReparametrizationSampler[GPflowPredictor]#

Return a reparametrization sampler providing num_samples samples.

Returns: The reparametrization sampler.

class GaussianProcessRegression(model: gpflow.models.GPR, optimizer: Optimizer | None = None, num_kernel_samples: int = 10, num_rff_features: int = 1000, use_decoupled_sampler: bool = True)[source]#

Bases: trieste.models.gpflow.interface.GPflowPredictor, trieste.models.interfaces.TrainableProbabilisticModel, trieste.models.interfaces.FastUpdateModel, trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints, trieste.models.interfaces.SupportsGetInternalData, trieste.models.interfaces.HasTrajectorySampler

A TrainableProbabilisticModel wrapper for a GPflow GPR.

As Bayesian optimization requires a large number of sequential predictions (i.e. when maximizing acquisition functions), rather than calling the model directly at prediction time we instead call the posterior objects built by these models. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters

model – The GPflow model to wrap.
optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.
num_kernel_samples – Number of randomly sampled kernels (for each kernel parameter) to evaluate before beginning model optimization. Therefore, for a kernel with p (vector-valued) parameters, we evaluate p * num_kernel_samples kernels.
num_rff_features – The number of random Fourier features used to approximate the kernel when calling trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
use_decoupled_sampler – If True use a decoupled random Fourier feature sampler, else just use a random Fourier feature sampler. The decoupled sampler suffers less from overestimating variance and can typically get away with a lower num_rff_features.

property model(self) → gpflow.models.GPR#: The underlying GPflow model.

predict_y(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Note that this is not supported by all models.

Parameters: query_points – The points at which to make predictions, of shape […, D].
Returns: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

update(self, dataset: trieste.data.Dataset) → None #

Update the model given the specified dataset. Does not train the model.

Parameters: dataset – The data with which to update the model.

covariance_between_points(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType #

Compute the posterior covariance between sets of query points.

\[\Sigma_{12} = K_{12} - K_{x1}(K_{xx} + \sigma^2 I)^{-1}K_{x2}\]

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters

query_points_1 – Set of query points with shape […, N, D]
query_points_2 – Sets of query points with shape [M, D]

Returns

Covariance matrix between the sets of query points with shape […, L, N, M] (L being the number of latent GPs = number of output dimensions)

optimize(self, dataset: trieste.data.Dataset) → None #

Optimize the model with the specified dataset.

For GaussianProcessRegression, we (optionally) try multiple randomly sampled kernel parameter configurations as well as the configuration specified when initializing the kernel. The best configuration is used as the starting point for model optimization.

For trainable parameters constrained to lie in a finite interval (through a sigmoid bijector), we begin model optimization from the best of a random sample from these parameters’ acceptable domains.

For trainable parameters without constraints but with priors, we begin model optimization from the best of a random sample from these parameters’ priors.

For trainable parameters with neither priors nor constraints, we begin optimization from their initial values.

Parameters: dataset – The data with which to optimize the model.

find_best_model_initialization(self, num_kernel_samples: int) → None #

Test num_kernel_samples models with sampled kernel parameters. The model’s kernel parameters are then set to the sample achieving maximal likelihood.

Parameters: num_kernel_samples – Number of randomly sampled kernels to evaluate.

trajectory_sampler(self) → trieste.models.interfaces.TrajectorySampler[GaussianProcessRegression]#

Return a trajectory sampler. For GaussianProcessRegression, we build trajectories using a random Fourier feature approximation.

At the moment only models with single latent GP are supported.

Returns: The trajectory sampler.
Raises: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

get_internal_data(self) → trieste.data.Dataset #

Return the model’s training data.

Returns: The model’s training data.

conditional_predict_f(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Returns the marginal GP distribution at query_points conditioned on both the model and some additional data, using exact formula. See [CGE14] (eqs. 8-10) for details.

Parameters

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]

Returns

mean_qp_new: predictive mean at query_points, with shape […, M, L], and var_qp_new: predictive variance at query_points, with shape […, M, L]

conditional_predict_joint(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Predicts the joint GP distribution at query_points conditioned on both the model and some additional data, using exact formula. See [CGE14] (eqs. 8-10) for details.

Parameters

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]

Returns

mean_qp_new: predictive mean at query_points, with shape […, M, L], and cov_qp_new: predictive covariance between query_points, with shape […, L, M, M]

conditional_predict_f_sample(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset, num_samples: int) → trieste.types.TensorType #

Generates samples of the GP at query_points conditioned on both the model and some additional data.

Parameters

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]
num_samples – number of samples

Returns

samples of f at query points, with shape […, num_samples, M, L]

conditional_predict_y(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Generates samples of y from the GP at query_points conditioned on both the model and some additional data.

Parameters

query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]

Returns

predictive variance at query_points, with shape […, M, L], and predictive variance at query_points, with shape […, M, L]

class SparseGaussianProcessRegression(model: gpflow.models.SGPR, optimizer: Optimizer | None = None, num_rff_features: int = 1000, inducing_point_selector: Optional[trieste.models.gpflow.inducing_point_selectors.InducingPointSelector[SparseGaussianProcessRegression]] = None)[source]#

Bases: trieste.models.gpflow.interface.GPflowPredictor, trieste.models.interfaces.TrainableProbabilisticModel, trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints, trieste.models.interfaces.SupportsGetInducingVariables, trieste.models.interfaces.SupportsGetInternalData, trieste.models.interfaces.HasTrajectorySampler

A TrainableProbabilisticModel wrapper for a GPflow SGPR. At the moment we only support models with a single latent GP. This is due to compute_qu method in SGPR that is used for computing covariance between query points and trajectory sampling, which at the moment works only for single latent GP.

Similarly to our GaussianProcessRegression, our SGPR wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters

model – The GPflow model to wrap.
optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.
num_rff_features – The number of random Fourier features used to approximate the kernel when calling trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
inducing_point_selector – The (optional) desired inducing point selector that will update the underlying GPflow SGPR model’s inducing points as the optimization progresses.

Raises

NotImplementedError (or ValueError) – If we try to use a model with invalid num_rff_features, or an inducing_point_selector with a model that has more than one set of inducing points.

property model(self) → gpflow.models.SGPR#: The underlying GPflow model.

predict_y(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Note that this is not supported by all models.

Parameters: query_points – The points at which to make predictions, of shape […, D].
Returns: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

optimize(self, dataset: trieste.data.Dataset) → None #

Optimize the model with the specified dataset.

Parameters: dataset – The data with which to optimize the model.

update(self, dataset: trieste.data.Dataset) → None #

Update the model given the specified dataset. Does not train the model.

Parameters: dataset – The data with which to update the model.

_update_inducing_variables(self, new_inducing_points: trieste.types.TensorType) → None #

When updating the inducing points of a model, we must also update the other inducing variables, i.e. q_mu and q_sqrt accordingly. The exact form of this update depends if we are using whitened representations of the inducing variables. See _whiten_points() for details.

Parameters: new_inducing_points – The desired values for the new inducing points.
Raises: NotImplementedError – If we try to update the inducing variables of a model that has more than one set of inducing points.

get_inducing_variables(self) → Tuple[Union[trieste.types.TensorType, list[trieste.types.TensorType]], trieste.types.TensorType, trieste.types.TensorType, bool]#

Return the model’s inducing variables. The SGPR model does not have q_mu, q_sqrt and whiten objects. We can use compute_qu method to obtain q_mu and q_sqrt, while the SGPR model does not use the whitened representation. Note that at the moment compute_qu works only for single latent GP and returns q_sqrt in a shape that is inconsistent with the SVGP model (hence we need to do modify its shape).

Returns: The inducing points (i.e. locations of the inducing variables), as a Tensor or a list of Tensors (when the model has multiple inducing points); a tensor containing the variational mean q_mu; a tensor containing the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.
Raises: NotImplementedError – If the model has more than one latent GP.

covariance_between_points(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType #

Compute the posterior covariance between sets of query points.

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters

query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]

Returns

Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)

trajectory_sampler(self) → trieste.models.interfaces.TrajectorySampler[SparseGaussianProcessRegression]#

Return a trajectory sampler. For SparseGaussianProcessRegression, we build trajectories using a decoupled random Fourier feature approximation. Note that this is available only for single output models.

At the moment only models with single latent GP are supported.

Returns: The trajectory sampler.
Raises: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

get_internal_data(self) → trieste.data.Dataset #

Return the model’s training data.

Returns: The model’s training data.

class SparseVariational(model: gpflow.models.SVGP, optimizer: Optimizer | None = None, num_rff_features: int = 1000, inducing_point_selector: Optional[trieste.models.gpflow.inducing_point_selectors.InducingPointSelector[SparseVariational]] = None)[source]#

A TrainableProbabilisticModel wrapper for a GPflow SVGP.

Similarly to our GaussianProcessRegression, our SVGP wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters

model – The underlying GPflow sparse variational model.
optimizer – The optimizer with which to train the model. Defaults to BatchOptimizer with Adam with batch size 100.
num_rff_features – The number of random Fourier features used to approximate the kernel when performing decoupled Thompson sampling through its trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
inducing_point_selector – The (optional) desired inducing_point_selector that will update the underlying GPflow sparse variational model’s inducing points as the optimization progresses.

Raises

NotImplementedError – If we try to use an inducing_point_selector with a model that has more than one set of inducing points.

property model(self) → gpflow.models.SVGP#: The underlying GPflow model.

predict_y(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Note that this is not supported by all models.

Parameters: query_points – The points at which to make predictions, of shape […, D].
Returns: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

update(self, dataset: trieste.data.Dataset) → None #

Update the model given the specified dataset. Does not train the model.

Parameters: dataset – The data with which to update the model.

optimize(self, dataset: trieste.data.Dataset) → None #

Optimize the model with the specified dataset.

Parameters: dataset – The data with which to optimize the model.

_update_inducing_variables(self, new_inducing_points: trieste.types.TensorType) → None #

Parameters: new_inducing_points – The desired values for the new inducing points.
Raises: NotImplementedError – If we try to update the inducing variables of a model that has more than one set of inducing points.

get_inducing_variables(self) → Tuple[Union[trieste.types.TensorType, list[trieste.types.TensorType]], trieste.types.TensorType, trieste.types.TensorType, bool]#

Return the model’s inducing variables.

Returns: The inducing points (i.e. locations of the inducing variables), as a Tensor or a list of Tensors (when the model has multiple inducing points); A tensor containing the variational mean q_mu; a tensor containing the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.

covariance_between_points(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType #

Compute the posterior covariance between sets of query points.

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters

query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]

Returns

Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)

trajectory_sampler(self) → trieste.models.interfaces.TrajectorySampler[SparseVariational]#

Return a trajectory sampler. For SparseVariational, we build trajectories using a decoupled random Fourier feature approximation.

At the moment only models with single latent GP are supported.

Returns: The trajectory sampler.
Raises: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

class VariationalGaussianProcess(model: gpflow.models.VGP, optimizer: Optimizer | None = None, use_natgrads: bool = False, natgrad_gamma: Optional[float] = None, num_rff_features: int = 1000)[source]#

A TrainableProbabilisticModel wrapper for a GPflow VGP.

A Variational Gaussian Process (VGP) approximates the posterior of a GP using the multivariate Gaussian closest to the posterior of the GP by minimizing the KL divergence between approximated and exact posteriors. See [OA09] for details.

The VGP provides (approximate) GP modelling under non-Gaussian likelihoods, for example when fitting a classification model over binary data.

A whitened representation and (optional) natural gradient steps are used to aid model optimization.

Similarly to our GaussianProcessRegression, our VGP wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling update_posterior_cache() (this happens automatically after calls to update() or \(optimize\)).

Parameters

model – The GPflow VGP.
optimizer – The optimizer with which to train the model. Defaults to Optimizer with Scipy.
use_natgrads – If True then alternate model optimization steps with natural gradient updates. Note that natural gradients requires a BatchOptimizer wrapper with Optimizer optimizer.
num_rff_features – The number of random Fourier features used to approximate the kernel when performing decoupled Thompson sampling through its trajectory_sampler(). We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Natgrad_gamma

Gamma parameter for the natural gradient optimizer.

Raises

ValueError (or InvalidArgumentError) – If model’s q_sqrt is not rank 3 or if attempting to combine natural gradients with a Scipy optimizer.

property model(self) → gpflow.models.VGP#: The underlying GPflow model.

predict_y(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]#

Return the mean and variance of the independent marginal distributions at each point in query_points for the observations, including noise contributions.

Note that this is not supported by all models.

Parameters: query_points – The points at which to make predictions, of shape […, D].
Returns: The mean and variance of the independent marginal distributions at each point in query_points. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.

update(self, dataset: trieste.data.Dataset, *, jitter: float = DEFAULTS.JITTER) → None #

Update the model given the specified dataset. Does not train the model.

Parameters

dataset – The data with which to update the model.
jitter – The size of the jitter to use when stabilizing the Cholesky decomposition of the covariance matrix.

optimize(self, dataset: trieste.data.Dataset) → None #

VariationalGaussianProcess has a custom optimize method that (optionally) permits alternating between standard optimization steps (for kernel parameters) and natural gradient steps for the variational parameters (q_mu and q_sqrt). See [SEH18] for details. Using natural gradients can dramatically speed up model fitting, especially for ill-conditioned posteriors.

If using natural gradients, our optimizer inherits the mini-batch behavior and number of optimization steps as the base optimizer specified when initializing the VariationalGaussianProcess.

get_inducing_variables(self) → Tuple[trieste.types.TensorType, trieste.types.TensorType, trieste.types.TensorType, bool]#

Return the model’s inducing variables. Note that GPflow’s VGP model is hard-coded to use the whitened representation.

Returns: Tensors containing: the inducing points (i.e. locations of the inducing variables); the variational mean q_mu; the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.

trajectory_sampler(self) → trieste.models.interfaces.TrajectorySampler[VariationalGaussianProcess]#

Return a trajectory sampler. For VariationalGaussianProcess, we build trajectories using a decoupled random Fourier feature approximation.

At the moment only models with single latent GP are supported.

Returns: The trajectory sampler.
Raises: NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.

covariance_between_points(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType #

Compute the posterior covariance between sets of query points.

Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.

Parameters

query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]

Returns

Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)

class BatchReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.SupportsPredictJoint)[source]#

Bases: trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.SupportsPredictJoint]

This sampler employs the reparameterization trick to approximate batches of samples from a ProbabilisticModel‘s predictive joint distribution as

\[x \mapsto \mu(x) + \epsilon L(x)\]

where \(L\) is the Cholesky factor s.t. \(LL^T\) is the covariance, and \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.

Parameters

sample_size – The number of samples for each batch of points. Must be positive.
model – The model to sample from.

Raises

ValueError (or InvalidArgumentError) – If sample_size is not positive.

sample(self, at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType #

Return approximate samples from the model specified at __init__(). Multiple calls to sample(), for any given BatchReparametrizationSampler and at, will produce the exact same samples. Calls to sample() on different BatchReparametrizationSampler instances will produce different samples.

Parameters

at – Batches of query points at which to sample the predictive distribution, with shape […, B, D], for batches of size B of points of dimension D. Must have a consistent batch size across all calls to sample() for any given BatchReparametrizationSampler.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

Returns

The samples, of shape […, S, B, L], where S is the sample_size, B the number of points per batch, and L the dimension of the model’s predictive distribution.

Raises

ValueError (or InvalidArgumentError) – If any of the following are true: - at is a scalar. - The batch size B of at is not positive. - The batch size B of at differs from that of previous calls. - jitter is negative.

class DecoupledTrajectorySampler(model: Union[FeatureDecompositionInducingPointModel, FeatureDecompositionInternalDataModel], num_features: int = 1000)[source]#

Bases: FeatureDecompositionTrajectorySampler[Union[FeatureDecompositionInducingPointModel, FeatureDecompositionInternalDataModel]]

This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model using decoupled sampling. See [WBT+20] for an introduction to decoupled sampling. Currently we do not support models with multiple latent Gaussian processes.

Unlike our RandomFourierFeatureTrajectorySampler which uses a RFF decomposition to aprroximate the Gaussian process posterior, a DecoupledTrajectorySampler only uses an RFF decomposition to approximate the Gausian process prior and instead using a cannonical decomposition to discretize the effect of updating the prior on the given data.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(.) = \sum_{i=1}^L w_i\phi_i(.) + \sum_{j=1}^m v_jk(.,z_j)\]

where \(\phi_i(.)\) and \(w_i\) are the Fourier features and their weights that discretize the prior. In contrast, k(.,z_j) and \(v_i\) are the cannonical features and their weights that discretize the data update.

The expression for \(v_i\) depends on if we are using an exact Gaussian process or a sparse approximations. See eq. (13) in [WBT+20] for details.

Note that if a model is both of FeatureDecompositionInducingPointModel type and FeatureDecompositionInternalDataModel type, FeatureDecompositionInducingPointModel will take a priority and inducing points will be used for computations rather than data.

Parameters

model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Raises

NotImplementedError – If the model is not of valid type.

_prepare_weight_sampler(self) → Callable[[int], trieste.types.TensorType]#: Prepare the sampler function that provides samples of the feature weights for both the RFF and cannonical feature functions, i.e. we return a function that takes in a batch size B and returns B samples for the weights of each of the L RFF features and N cannonical features.

class IndependentReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.ProbabilisticModel)[source]#

Bases: trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.ProbabilisticModel]

This sampler employs the reparameterization trick to approximate samples from a ProbabilisticModel‘s predictive distribution as

\[x \mapsto \mu(x) + \epsilon \sigma(x)\]

where \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.

Parameters

sample_size – The number of samples to take at each point. Must be positive.
model – The model to sample from.

Raises

ValueError (or InvalidArgumentError) – If sample_size is not positive.

sample(self, at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType #

Return approximate samples from the model specified at __init__(). Multiple calls to sample(), for any given IndependentReparametrizationSampler and at, will produce the exact same samples. Calls to sample() on different IndependentReparametrizationSampler instances will produce different samples.

Parameters

at – Where to sample the predictive distribution, with shape […, 1, D], for points of dimension D.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

Returns

The samples, of shape […, S, 1, L], where S is the sample_size and L is the number of latent model dimensions.

Raises

ValueError (or InvalidArgumentError) – If at has an invalid shape or jitter is negative.

class RandomFourierFeatureTrajectorySampler(model: FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#

Bases: FeatureDecompositionTrajectorySampler[FeatureDecompositionInternalDataModel]

This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model. For tractibility, the Gaussian process is approximated with a Bayesian Linear model across a set of features sampled from the Fourier feature decomposition of the model’s kernel. See [HernandezLHG14] for details. Currently we do not support models with multiple latent Gaussian processes.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m Fourier features and \(\theta_i\) are feature weights sampled from a posterior distribution that depends on the feature values at the model’s datapoints.

Our implementation follows [HernandezLHG14], with our calculations differing slightly depending on properties of the problem. In particular, we used different calculation strategies depending on the number of considered features m and the number of data points n.

If \(m<n\) then we follow Appendix A of [HernandezLHG14] and calculate the posterior distribution for \(\theta\) following their Bayesian linear regression motivation, i.e. the computation revolves around an O(m^3) inversion of a design matrix.

If \(n<m\) then we use the kernel trick to recast computation to revolve around an O(n^3) inversion of a gram matrix. As well as being more efficient in early BO steps (where \(n<m\)), this second computation method allows much larger choices of m (as required to approximate very flexible kernels).

Parameters

model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Raises

ValueError – If dataset is empty.

_prepare_weight_sampler(self) → Callable[[int], trieste.types.TensorType]#: Calculate the posterior of theta (the feature weights) for the RFFs, returning a function that takes in a batch size B and returns B samples for the weights of each of the RFF L features.

_prepare_theta_posterior_in_design_space(self) → tensorflow_probability.distributions.MultivariateNormalTriL#

Calculate the posterior of theta (the feature weights) in the design space. This distribution is a Gaussian

\[\theta \sim N(D^{-1}\Phi^Ty,D^{-1}\sigma^2)\]

where the [m,m] design matrix \(D=(\Phi^T\Phi + \sigma^2I_m)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).

_prepare_theta_posterior_in_gram_space(self) → tensorflow_probability.distributions.MultivariateNormalTriL#

Calculate the posterior of theta (the feature weights) in the gram space.

\[\theta \sim N(\Phi^TG^{-1}y,I_m - \Phi^TG^{-1}\Phi)\]

where the [n,n] gram matrix \(G=(\Phi\Phi^T + \sigma^2I_n)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).

class feature_decomposition_trajectory(feature_functions: Callable[[trieste.types.TensorType], trieste.types.TensorType], weight_sampler: Callable[[int], trieste.types.TensorType], mean_function: Callable[[trieste.types.TensorType], trieste.types.TensorType])[source]#

Bases: trieste.models.interfaces.TrajectoryFunctionClass

An approximate sample from a Gaussian processes’ posterior samples represented as a finite weighted sum of features.

A trajectory is given by

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m feature functions and \(\theta_i\) are feature weights sampled from a posterior distribution.

The number of trajectories (i.e. batch size) is determined from the first call of the trajectory. In order to change the batch size, a new TrajectoryFunction must be built.

Parameters

feature_functions – Set of feature function.
weight_sampler – New sampler that generates feature weight samples.
mean_function – The underlying model’s mean function.

__call__(self, x: trieste.types.TensorType) → trieste.types.TensorType #: Call trajectory function.

resample(self) → None #: Efficiently resample in-place without retracing.

update(self, weight_sampler: Callable[[int], trieste.types.TensorType]) → None #

Efficiently update the trajectory with a new weight distribution and resample its weights.

Parameters: weight_sampler – New sampler that generates feature weight samples.

assert_data_is_compatible(new_data: trieste.data.Dataset, existing_data: trieste.data.Dataset) → None [source]#

Checks that new data is compatible with existing data.

Parameters

new_data – New data.
existing_data – Existing data.

Raises

ValueError – if trailing dimensions of the query point or observation differ.

check_optimizer(optimizer: Union[trieste.models.optimizer.BatchOptimizer, trieste.models.optimizer.Optimizer]) → None [source]#

Check that the optimizer for the GPflow models is using a correct optimizer wrapper.

Stochastic gradient descent based methods implemented in TensorFlow would not work properly without mini-batches and hence BatchOptimizer that prepares mini-batches and calls the optimizer iteratively needs to be used. GPflow’s Scipy optimizer on the other hand should use the non-batch wrapper Optimizer.

Parameters: optimizer – An instance of the optimizer wrapper with the underlying optimizer.
Raises: ValueError – If Optimizer is not using BatchOptimizer or Scipy is using BatchOptimizer.

randomize_hyperparameters(object: gpflow.Module) → None [source]#

Sets hyperparameters to random samples from their constrained domains or (if not constraints are available) their prior distributions.

Parameters: object – Any gpflow Module.

squeeze_hyperparameters(object: gpflow.Module, alpha: float = 0.01, epsilon: float = 1e-07) → None [source]#

Squeezes the parameters to be strictly inside their range defined by the Sigmoid, or strictly greater than the limit defined by the Shift+Softplus. This avoids having Inf unconstrained values when the parameters are exactly at the boundary.

Parameters

object – Any gpflow Module.
alpha – the proportion of the range with which to squeeze for the Sigmoid case
epsilon – the value with which to offset the shift for the Softplus case.

Raises

ValueError – If alpha is not in (0,1) or epsilon <= 0

trieste.models.gpflow#

Submodules#

Package Contents#

`trieste.models.gpflow`#