trieste.models.gpflow.sampler#

This module is the home of the sampling functionality required by Trieste’s GPflow wrappers.

Module Contents#

qmc_normal_samples(num_samples: _IntTensorType, n_sample_dim: _IntTensorType, skip: _IntTensorType = 0, dtype: tensorflow.DType = tf.float64) → tensorflow.Tensor[source]#: Generates num_samples sobol samples, skipping the first skip, where each sample has dimension n_sample_dim.

class IndependentReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.ProbabilisticModel, qmc: bool = False, qmc_skip: bool = True)[source]#

Bases: trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.ProbabilisticModel]

This sampler employs the reparameterization trick to approximate samples from a ProbabilisticModel‘s predictive distribution as

\[x \mapsto \mu(x) + \epsilon \sigma(x)\]

where \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.

Parameters:

sample_size – The number of samples to take at each point. Must be positive.
model – The model to sample from.
qmc – Whether to use QMC sobol sampling instead of random normal sampling. QMC sampling more accurately approximates a normal distribution than truly random samples.
qmc_skip – Whether to use the skip parameter to ensure the QMC sampler gives different samples whenever it is reset. This is not supported with XLA.

Raises:

ValueError (or InvalidArgumentError) – If sample_size is not positive.

skip: trieste.types.TensorType[source]#: Number of sobol sequence points to skip. This is incremented for each sampler.

sample(at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType[source]#

Return approximate samples from the model specified at __init__(). Multiple calls to sample(), for any given IndependentReparametrizationSampler and at, will produce the exact same samples. Calls to sample() on different IndependentReparametrizationSampler instances will produce different samples.

Parameters:

at – Where to sample the predictive distribution, with shape […, 1, D], for points of dimension D.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

Returns:

The samples, of shape […, S, 1, L], where S is the sample_size and L is the number of latent model dimensions.

Raises:

ValueError (or InvalidArgumentError) – If at has an invalid shape or jitter is negative.

class BatchReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.SupportsPredictJoint, qmc: bool = False, qmc_skip: bool = True)[source]#

Bases: trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.SupportsPredictJoint]

This sampler employs the reparameterization trick to approximate batches of samples from a ProbabilisticModel‘s predictive joint distribution as

\[x \mapsto \mu(x) + \epsilon L(x)\]

where \(L\) is the Cholesky factor s.t. \(LL^T\) is the covariance, and \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.

Parameters:

sample_size – The number of samples for each batch of points. Must be positive.
model – The model to sample from.
qmc – Whether to use QMC sobol sampling instead of random normal sampling. QMC sampling more accurately approximates a normal distribution than truly random samples.
qmc_skip – Whether to use the skip parameter to ensure the QMC sampler gives different samples whenever it is reset. This is not supported with XLA.

Raises:

ValueError (or InvalidArgumentError) – If sample_size is not positive.

skip: trieste.types.TensorType[source]#: Number of sobol sequence points to skip. This is incremented for each sampler.

sample(at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType[source]#

Return approximate samples from the model specified at __init__(). Multiple calls to sample(), for any given BatchReparametrizationSampler and at, will produce the exact same samples. Calls to sample() on different BatchReparametrizationSampler instances will produce different samples.

Parameters:

at – Batches of query points at which to sample the predictive distribution, with shape […, B, D], for batches of size B of points of dimension D. Must have a consistent batch size across all calls to sample() for any given BatchReparametrizationSampler.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

Returns:

The samples, of shape […, S, B, L], where S is the sample_size, B the number of points per batch, and L the dimension of the model’s predictive distribution.

Raises:

ValueError (or InvalidArgumentError) – If any of the following are true: - at is a scalar. - The batch size B of at is not positive. - The batch size B of at differs from that of previous calls. - jitter is negative.

class FeatureDecompositionInternalDataModel[source]#

Bases: trieste.models.interfaces.SupportsGetKernel, trieste.models.interfaces.SupportsGetMeanFunction, trieste.models.interfaces.SupportsGetObservationNoise, trieste.models.interfaces.SupportsGetInternalData, typing_extensions.Protocol

A probabilistic model that supports get_kernel, get_mean_function, get_observation_noise and get_internal_data methods.

class FeatureDecompositionInducingPointModel[source]#

Bases: trieste.models.interfaces.SupportsGetKernel, trieste.models.interfaces.SupportsGetMeanFunction, trieste.models.interfaces.SupportsGetInducingVariables, typing_extensions.Protocol

A probabilistic model that supports get_kernel, get_mean_function and get_inducing_point methods.

class FeatureDecompositionTrajectorySampler(model: FeatureDecompositionTrajectorySamplerModelType, feature_functions: ResampleableRandomFourierFeatureFunctions)[source]#

Bases: trieste.models.interfaces.TrajectorySampler[FeatureDecompositionTrajectorySamplerModelType], abc.ABC

This is a general class to build functions that approximate a trajectory sampled from an underlying Gaussian process model.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m features and \(\theta_i\) are feature weights sampled from a given distribution

Achieving consistency (ensuring that the same sample draw for all evalutions of a particular trajectory function) for exact sample draws from a GP is prohibitively costly because it scales cubically with the number of query points. However, finite feature representations can be evaluated with constant cost regardless of the required number of queries.

Parameters:: model – The model to sample from.
Raises:: ValueError – If dataset is empty.

get_trajectory() → trieste.models.interfaces.TrajectoryFunction[source]#

Generate an approximate function draw (trajectory) by sampling weights and evaluating the feature functions.

Returns:: A trajectory function representing an approximate trajectory from the Gaussian process, taking an input of shape [N, B, D] and returning shape [N, B, L] where L is the number of outputs of the model.

update_trajectory(trajectory: trieste.models.interfaces.TrajectoryFunction) → trieste.models.interfaces.TrajectoryFunction[source]#

Efficiently update a TrajectoryFunction to reflect an update in its underlying ProbabilisticModel and resample accordingly.

For a FeatureDecompositionTrajectorySampler, updating the sampler corresponds to resampling the feature functions (taking into account any changed kernel parameters) and recalculating the weight distribution.

Parameters:: trajectory – The trajectory function to be resampled.
Returns:: The new resampled trajectory function.

resample_trajectory(trajectory: trieste.models.interfaces.TrajectoryFunction) → trieste.models.interfaces.TrajectoryFunction[source]#

Efficiently resample a TrajectoryFunction in-place to avoid function retracing with every new sample.

Parameters:: trajectory – The trajectory function to be resampled.
Returns:: The new resampled trajectory function.

abstract _prepare_weight_sampler() → Callable[[int], trieste.types.TensorType][source]#: Calculate the posterior of the feature weights for the specified feature functions, returning a function that takes in a batch size B and returns B samples for the weights of each of the F features for L outputs.

class RandomFourierFeatureTrajectorySampler(model: FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#

Bases: FeatureDecompositionTrajectorySampler[FeatureDecompositionInternalDataModel]

This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model. For tractibility, the Gaussian process is approximated with a Bayesian Linear model across a set of features sampled from the Fourier feature decomposition of the model’s kernel. See [HernandezLHG14] for details. Currently we do not support models with multiple latent Gaussian processes.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m Fourier features and \(\theta_i\) are feature weights sampled from a posterior distribution that depends on the feature values at the model’s datapoints.

Our implementation follows [HernandezLHG14], with our calculations differing slightly depending on properties of the problem. In particular, we used different calculation strategies depending on the number of considered features m and the number of data points n.

If \(m<n\) then we follow Appendix A of [HernandezLHG14] and calculate the posterior distribution for \(\theta\) following their Bayesian linear regression motivation, i.e. the computation revolves around an O(m^3) inversion of a design matrix.

If \(n<m\) then we use the kernel trick to recast computation to revolve around an O(n^3) inversion of a gram matrix. As well as being more efficient in early BO steps (where \(n<m\)), this second computation method allows much larger choices of m (as required to approximate very flexible kernels).

Parameters:

model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Raises:

ValueError – If dataset is empty.

_prepare_weight_sampler() → Callable[[int], trieste.types.TensorType][source]#: Calculate the posterior of theta (the feature weights) for the RFFs, returning a function that takes in a batch size B and returns B samples for the weights of each of the RFF F features for one output.

_prepare_theta_posterior_in_design_space() → tensorflow_probability.distributions.MultivariateNormalTriL[source]#

Calculate the posterior of theta (the feature weights) in the design space. This distribution is a Gaussian

\[\theta \sim N(D^{-1}\Phi^Ty,D^{-1}\sigma^2)\]

where the [m,m] design matrix \(D=(\Phi^T\Phi + \sigma^2I_m)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).

_prepare_theta_posterior_in_gram_space() → tensorflow_probability.distributions.MultivariateNormalTriL[source]#

Calculate the posterior of theta (the feature weights) in the gram space.

\[\theta \sim N(\Phi^TG^{-1}y,I_m - \Phi^TG^{-1}\Phi)\]

where the [n,n] gram matrix \(G=(\Phi\Phi^T + \sigma^2I_n)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).

class DecoupledTrajectorySampler(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#

Bases: FeatureDecompositionTrajectorySampler[Union[FeatureDecompositionInducingPointModel, FeatureDecompositionInternalDataModel]]

This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model using decoupled sampling. See [WBT+20] for an introduction to decoupled sampling.

Unlike our RandomFourierFeatureTrajectorySampler which uses a RFF decomposition to aprroximate the Gaussian process posterior, a DecoupledTrajectorySampler only uses an RFF decomposition to approximate the Gausian process prior and instead using a canonical decomposition to discretize the effect of updating the prior on the given data.

In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation

\[\hat{f}(.) = \sum_{i=1}^L w_i\phi_i(.) + \sum_{j=1}^m v_jk(.,z_j)\]

where \(\phi_i(.)\) and \(w_i\) are the Fourier features and their weights that discretize the prior. In contrast, k(.,z_j) and \(v_i\) are the canonical features and their weights that discretize the data update.

The expression for \(v_i\) depends on if we are using an exact Gaussian process or a sparse approximations. See eq. (13) in [WBT+20] for details.

Note that if a model is both of FeatureDecompositionInducingPointModel type and FeatureDecompositionInternalDataModel type, FeatureDecompositionInducingPointModel will take a priority and inducing points will be used for computations rather than data.

Parameters:

model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.

Raises:

NotImplementedError – If the model is not of valid type.

_prepare_weight_sampler() → Callable[[int], trieste.types.TensorType][source]#: Prepare the sampler function that provides samples of the feature weights for both the RFF and canonical feature functions, i.e. we return a function that takes in a batch size B and returns B samples for the weights of each of the F RFF features and M canonical features for L outputs.

class ResampleableRandomFourierFeatureFunctions(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, n_components: int)[source]#

Bases: gpflux.layers.basis_functions.fourier_features.RandomFourierFeaturesCosine

A wrapper around GPFlux’s random Fourier feature function that allows for efficient in-place updating when generating new decompositions.

In particular, the bias and weights are stored as variables, which can then be updated by calling resample() without triggering expensive graph retracing.

Parameters:

model – The model that will be approximed by these feature functions.
n_components – The desired number of features.

Raises:

NotImplementedError – If the model is not of valid type.

resample() → None[source]#: Resample weights and biases

call(inputs: trieste.types.TensorType) → trieste.types.TensorType[source]#: Evaluate the basis functions at inputs

class ResampleableDecoupledFeatureFunctions(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, n_components: int)[source]#

Bases: ResampleableRandomFourierFeatureFunctions

A wrapper around our ResampleableRandomFourierFeatureFunctions which rather than evaluates just F RFF functions instead evaluates the concatenation of F RFF functions with evaluations of the canonical basis functions.

Parameters:

model – The model that will be approximed by these feature functions.
n_components – The desired number of features.

call(inputs: trieste.types.TensorType) → trieste.types.TensorType[source]#: combine prior basis functions with canonical basis functions

class feature_decomposition_trajectory(feature_functions: Callable[[trieste.types.TensorType], trieste.types.TensorType], weight_sampler: Callable[[int], trieste.types.TensorType], mean_function: Callable[[trieste.types.TensorType], trieste.types.TensorType], encoder: trieste.space.EncoderFunction | None = None)[source]#

Bases: trieste.models.interfaces.TrajectoryFunctionClass

An approximate sample from a Gaussian processes’ posterior samples represented as a finite weighted sum of features.

A trajectory is given by

\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]

where \(\phi_i\) are m feature functions and \(\theta_i\) are feature weights sampled from a posterior distribution.

The number of trajectories (i.e. batch size) is determined from the first call of the trajectory. In order to change the batch size, a new TrajectoryFunction must be built.

Parameters:

feature_functions – Set of feature function.
weight_sampler – New sampler that generates feature weight samples.
mean_function – The underlying model’s mean function.
encoder – Optional encoder with which to transform input points.

__call__(inputs: trieste.types.TensorType) → trieste.types.TensorType[source]#: Call trajectory function.

resample() → None[source]#: Efficiently resample in-place without retracing.

update(weight_sampler: Callable[[int], trieste.types.TensorType]) → None[source]#

Efficiently update the trajectory with a new weight distribution and resample its weights.

Parameters:: weight_sampler – New sampler that generates feature weight samples.