trieste.models.gpflow.sampler#
This module is the home of the sampling functionality required by Trieste’s GPflow wrappers.
Module Contents#
- qmc_normal_samples(num_samples: _IntTensorType, n_sample_dim: _IntTensorType, skip: _IntTensorType = 0, dtype: tensorflow.DType = tf.float64) tensorflow.Tensor[source]#
Generates num_samples sobol samples, skipping the first skip, where each sample has dimension n_sample_dim.
- class IndependentReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.ProbabilisticModel, qmc: bool = False, qmc_skip: bool = True)[source]#
Bases:
trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.ProbabilisticModel]This sampler employs the reparameterization trick to approximate samples from a
ProbabilisticModel‘s predictive distribution as\[x \mapsto \mu(x) + \epsilon \sigma(x)\]where \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.
- Parameters:
sample_size – The number of samples to take at each point. Must be positive.
model – The model to sample from.
qmc – Whether to use QMC sobol sampling instead of random normal sampling. QMC sampling more accurately approximates a normal distribution than truly random samples.
qmc_skip – Whether to use the skip parameter to ensure the QMC sampler gives different samples whenever it is reset. This is not supported with XLA.
- Raises:
ValueError (or InvalidArgumentError) – If
sample_sizeis not positive.
- skip: trieste.types.TensorType[source]#
Number of sobol sequence points to skip. This is incremented for each sampler.
- sample(at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) trieste.types.TensorType[source]#
Return approximate samples from the model specified at
__init__(). Multiple calls tosample(), for any givenIndependentReparametrizationSamplerandat, will produce the exact same samples. Calls tosample()on differentIndependentReparametrizationSamplerinstances will produce different samples.- Parameters:
at – Where to sample the predictive distribution, with shape […, 1, D], for points of dimension D.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.
- Returns:
The samples, of shape […, S, 1, L], where S is the sample_size and L is the number of latent model dimensions.
- Raises:
ValueError (or InvalidArgumentError) – If
athas an invalid shape orjitteris negative.
- class BatchReparametrizationSampler(sample_size: int, model: trieste.models.interfaces.SupportsPredictJoint, qmc: bool = False, qmc_skip: bool = True)[source]#
Bases:
trieste.models.interfaces.ReparametrizationSampler[trieste.models.interfaces.SupportsPredictJoint]This sampler employs the reparameterization trick to approximate batches of samples from a
ProbabilisticModel‘s predictive joint distribution as\[x \mapsto \mu(x) + \epsilon L(x)\]where \(L\) is the Cholesky factor s.t. \(LL^T\) is the covariance, and \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.
- Parameters:
sample_size – The number of samples for each batch of points. Must be positive.
model – The model to sample from.
qmc – Whether to use QMC sobol sampling instead of random normal sampling. QMC sampling more accurately approximates a normal distribution than truly random samples.
qmc_skip – Whether to use the skip parameter to ensure the QMC sampler gives different samples whenever it is reset. This is not supported with XLA.
- Raises:
ValueError (or InvalidArgumentError) – If
sample_sizeis not positive.
- skip: trieste.types.TensorType[source]#
Number of sobol sequence points to skip. This is incremented for each sampler.
- sample(at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) trieste.types.TensorType[source]#
Return approximate samples from the model specified at
__init__(). Multiple calls tosample(), for any givenBatchReparametrizationSamplerandat, will produce the exact same samples. Calls tosample()on differentBatchReparametrizationSamplerinstances will produce different samples.- Parameters:
at – Batches of query points at which to sample the predictive distribution, with shape […, B, D], for batches of size B of points of dimension D. Must have a consistent batch size across all calls to
sample()for any givenBatchReparametrizationSampler.jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.
- Returns:
The samples, of shape […, S, B, L], where S is the sample_size, B the number of points per batch, and L the dimension of the model’s predictive distribution.
- Raises:
ValueError (or InvalidArgumentError) – If any of the following are true: -
atis a scalar. - The batch size B ofatis not positive. - The batch size B ofatdiffers from that of previous calls. -jitteris negative.
- class FeatureDecompositionInternalDataModel[source]#
Bases:
trieste.models.interfaces.SupportsGetKernel,trieste.models.interfaces.SupportsGetMeanFunction,trieste.models.interfaces.SupportsGetObservationNoise,trieste.models.interfaces.SupportsGetInternalData,typing_extensions.ProtocolA probabilistic model that supports get_kernel, get_mean_function, get_observation_noise and get_internal_data methods.
- class FeatureDecompositionInducingPointModel[source]#
Bases:
trieste.models.interfaces.SupportsGetKernel,trieste.models.interfaces.SupportsGetMeanFunction,trieste.models.interfaces.SupportsGetInducingVariables,typing_extensions.ProtocolA probabilistic model that supports get_kernel, get_mean_function and get_inducing_point methods.
- class FeatureDecompositionTrajectorySampler(model: FeatureDecompositionTrajectorySamplerModelType, feature_functions: ResampleableRandomFourierFeatureFunctions)[source]#
Bases:
trieste.models.interfaces.TrajectorySampler[FeatureDecompositionTrajectorySamplerModelType],abc.ABCThis is a general class to build functions that approximate a trajectory sampled from an underlying Gaussian process model.
In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation
\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]where \(\phi_i\) are m features and \(\theta_i\) are feature weights sampled from a given distribution
Achieving consistency (ensuring that the same sample draw for all evalutions of a particular trajectory function) for exact sample draws from a GP is prohibitively costly because it scales cubically with the number of query points. However, finite feature representations can be evaluated with constant cost regardless of the required number of queries.
- Parameters:
model – The model to sample from.
- Raises:
ValueError – If
datasetis empty.
- get_trajectory() trieste.models.interfaces.TrajectoryFunction[source]#
Generate an approximate function draw (trajectory) by sampling weights and evaluating the feature functions.
- Returns:
A trajectory function representing an approximate trajectory from the Gaussian process, taking an input of shape [N, B, D] and returning shape [N, B, L] where L is the number of outputs of the model.
- update_trajectory(trajectory: trieste.models.interfaces.TrajectoryFunction) trieste.models.interfaces.TrajectoryFunction[source]#
Efficiently update a
TrajectoryFunctionto reflect an update in its underlyingProbabilisticModeland resample accordingly.For a
FeatureDecompositionTrajectorySampler, updating the sampler corresponds to resampling the feature functions (taking into account any changed kernel parameters) and recalculating the weight distribution.- Parameters:
trajectory – The trajectory function to be resampled.
- Returns:
The new resampled trajectory function.
- resample_trajectory(trajectory: trieste.models.interfaces.TrajectoryFunction) trieste.models.interfaces.TrajectoryFunction[source]#
Efficiently resample a
TrajectoryFunctionin-place to avoid function retracing with every new sample.- Parameters:
trajectory – The trajectory function to be resampled.
- Returns:
The new resampled trajectory function.
- class RandomFourierFeatureTrajectorySampler(model: FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#
Bases:
FeatureDecompositionTrajectorySampler[FeatureDecompositionInternalDataModel]This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model. For tractibility, the Gaussian process is approximated with a Bayesian Linear model across a set of features sampled from the Fourier feature decomposition of the model’s kernel. See [HernandezLHG14] for details. Currently we do not support models with multiple latent Gaussian processes.
In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation
\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]where \(\phi_i\) are m Fourier features and \(\theta_i\) are feature weights sampled from a posterior distribution that depends on the feature values at the model’s datapoints.
Our implementation follows [HernandezLHG14], with our calculations differing slightly depending on properties of the problem. In particular, we used different calculation strategies depending on the number of considered features m and the number of data points n.
If \(m<n\) then we follow Appendix A of [HernandezLHG14] and calculate the posterior distribution for \(\theta\) following their Bayesian linear regression motivation, i.e. the computation revolves around an O(m^3) inversion of a design matrix.
If \(n<m\) then we use the kernel trick to recast computation to revolve around an O(n^3) inversion of a gram matrix. As well as being more efficient in early BO steps (where \(n<m\)), this second computation method allows much larger choices of m (as required to approximate very flexible kernels).
- Parameters:
model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
- Raises:
ValueError – If
datasetis empty.
- _prepare_weight_sampler() Callable[[int], trieste.types.TensorType][source]#
Calculate the posterior of theta (the feature weights) for the RFFs, returning a function that takes in a batch size B and returns B samples for the weights of each of the RFF F features for one output.
- _prepare_theta_posterior_in_design_space() tensorflow_probability.distributions.MultivariateNormalTriL[source]#
Calculate the posterior of theta (the feature weights) in the design space. This distribution is a Gaussian
\[\theta \sim N(D^{-1}\Phi^Ty,D^{-1}\sigma^2)\]where the [m,m] design matrix \(D=(\Phi^T\Phi + \sigma^2I_m)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).
- _prepare_theta_posterior_in_gram_space() tensorflow_probability.distributions.MultivariateNormalTriL[source]#
Calculate the posterior of theta (the feature weights) in the gram space.
\[\theta \sim N(\Phi^TG^{-1}y,I_m - \Phi^TG^{-1}\Phi)\]where the [n,n] gram matrix \(G=(\Phi\Phi^T + \sigma^2I_n)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).
- class DecoupledTrajectorySampler(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]#
Bases:
FeatureDecompositionTrajectorySampler[Union[FeatureDecompositionInducingPointModel,FeatureDecompositionInternalDataModel]]This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model using decoupled sampling. See [WBT+20] for an introduction to decoupled sampling.
Unlike our
RandomFourierFeatureTrajectorySamplerwhich uses a RFF decomposition to aprroximate the Gaussian process posterior, aDecoupledTrajectorySampleronly uses an RFF decomposition to approximate the Gausian process prior and instead using a canonical decomposition to discretize the effect of updating the prior on the given data.In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation
\[\hat{f}(.) = \sum_{i=1}^L w_i\phi_i(.) + \sum_{j=1}^m v_jk(.,z_j)\]where \(\phi_i(.)\) and \(w_i\) are the Fourier features and their weights that discretize the prior. In contrast, k(.,z_j) and \(v_i\) are the canonical features and their weights that discretize the data update.
The expression for \(v_i\) depends on if we are using an exact Gaussian process or a sparse approximations. See eq. (13) in [WBT+20] for details.
Note that if a model is both of
FeatureDecompositionInducingPointModeltype andFeatureDecompositionInternalDataModeltype,FeatureDecompositionInducingPointModelwill take a priority and inducing points will be used for computations rather than data.- Parameters:
model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
- Raises:
NotImplementedError – If the model is not of valid type.
- _prepare_weight_sampler() Callable[[int], trieste.types.TensorType][source]#
Prepare the sampler function that provides samples of the feature weights for both the RFF and canonical feature functions, i.e. we return a function that takes in a batch size B and returns B samples for the weights of each of the F RFF features and M canonical features for L outputs.
- class ResampleableRandomFourierFeatureFunctions(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, n_components: int)[source]#
Bases:
gpflux.layers.basis_functions.fourier_features.RandomFourierFeaturesCosineA wrapper around GPFlux’s random Fourier feature function that allows for efficient in-place updating when generating new decompositions.
In particular, the bias and weights are stored as variables, which can then be updated by calling
resample()without triggering expensive graph retracing.Note that if a model is both of
FeatureDecompositionInducingPointModeltype andFeatureDecompositionInternalDataModeltype,FeatureDecompositionInducingPointModelwill take a priority and inducing points will be used for computations rather than data.- Parameters:
model – The model that will be approximed by these feature functions.
n_components – The desired number of features.
- Raises:
NotImplementedError – If the model is not of valid type.
- class ResampleableDecoupledFeatureFunctions(model: FeatureDecompositionInducingPointModel | FeatureDecompositionInternalDataModel, n_components: int)[source]#
Bases:
ResampleableRandomFourierFeatureFunctionsA wrapper around our
ResampleableRandomFourierFeatureFunctionswhich rather than evaluates just F RFF functions instead evaluates the concatenation of F RFF functions with evaluations of the canonical basis functions.Note that if a model is both of
FeatureDecompositionInducingPointModeltype andFeatureDecompositionInternalDataModeltype,FeatureDecompositionInducingPointModelwill take a priority and inducing points will be used for computations rather than data.- Parameters:
model – The model that will be approximed by these feature functions.
n_components – The desired number of features.
- class feature_decomposition_trajectory(feature_functions: Callable[[trieste.types.TensorType], trieste.types.TensorType], weight_sampler: Callable[[int], trieste.types.TensorType], mean_function: Callable[[trieste.types.TensorType], trieste.types.TensorType], encoder: trieste.space.EncoderFunction | None = None)[source]#
Bases:
trieste.models.interfaces.TrajectoryFunctionClassAn approximate sample from a Gaussian processes’ posterior samples represented as a finite weighted sum of features.
A trajectory is given by
\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]where \(\phi_i\) are m feature functions and \(\theta_i\) are feature weights sampled from a posterior distribution.
The number of trajectories (i.e. batch size) is determined from the first call of the trajectory. In order to change the batch size, a new
TrajectoryFunctionmust be built.- Parameters:
feature_functions – Set of feature function.
weight_sampler – New sampler that generates feature weight samples.
mean_function – The underlying model’s mean function.
encoder – Optional encoder with which to transform input points.