trieste.models.gpflow
#
This package contains the primary interface for Gaussian process models. It also contains a
number of TrainableProbabilisticModel
wrappers for GPflow-based models.
Submodules#
Package Contents#
-
build_gpr
(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, likelihood_variance: Optional[float] = None, trainable_likelihood: bool = False) → gpflow.models.GPR[source]# Build a
GPR
model with sensible initial parameters and priors. We useMatern52
kernel andConstant
mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.We set priors for kernel hyperparameters by default in order to stabilize model fitting. We found the priors below to be highly effective for objective functions defined over the unit hypercube. They do seem to work for other search space sizes, but we advise caution when using them in such search spaces. Using priors allows for using maximum a posteriori estimate of these kernel parameters during model fitting.
Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.
- Parameters
data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by
SIGNAL_NOISE_RATIO_LIKELIHOOD
, where signal variance in the kernel is set to the empirical variance.trainable_likelihood – If set to True Gaussian likelihood parameter is set to non-trainable. By default set to False.
- Returns
A
GPR
model.
-
build_sgpr
(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, likelihood_variance: Optional[float] = None, trainable_likelihood: bool = False, num_inducing_points: Optional[int] = None, trainable_inducing_points: bool = False) → gpflow.models.SGPR[source]# Build a
SGPR
model with sensible initial parameters and priors. We useMatern52
kernel andConstant
mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.We set priors for kernel hyperparameters by default in order to stabilize model fitting. We found the priors below to be highly effective for objective functions defined over the unit hypercube. They do seem to work for other search space sizes, but we advise caution when using them in such search spaces. Using priors allows for using maximum a posteriori estimate of these kernel parameters during model fitting.
For performance reasons number of inducing points should not be changed during Bayesian optimization. Hence, even if the initial dataset is smaller, we advise setting this to a higher number. By default inducing points are set to Sobol samples for the continuous search space, and simple random samples for discrete or mixed search spaces. This carries the risk that optimization gets stuck if they are not trainable, which calls for adaptive inducing point selection during the optimization. This functionality will be added to Trieste in future.
Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.
- Parameters
data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by
SIGNAL_NOISE_RATIO_LIKELIHOOD
, where signal variance in the kernel is set to the empirical variance.trainable_likelihood – If set to True Gaussian likelihood parameter is set to be trainable. By default set to False.
num_inducing_points – The number of inducing points can be optionally set to a certain value. If left unspecified (default), this number is set to either
NUM_INDUCING_POINTS_PER_DIM``*dimensionality of the search space or value given by ``MAX_NUM_INDUCING_POINTS
, whichever is smaller.trainable_inducing_points – If set to True inducing points will be set to be trainable. This option should be used with caution. By default set to False.
- Returns
An
SGPR
model.
-
build_svgp
(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, classification: bool = False, kernel_priors: bool = True, likelihood_variance: Optional[float] = None, trainable_likelihood: bool = False, num_inducing_points: Optional[int] = None, trainable_inducing_points: bool = False) → gpflow.models.SVGP[source]# Build a
SVGP
model with sensible initial parameters and priors. Both regression and binary classification models are available. We useMatern52
kernel andConstant
mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.We set priors for kernel hyperparameters by default in order to stabilize model fitting. We found the priors below to be highly effective for objective functions defined over the unit hypercube. They do seem to work for other search space sizes, but we advise caution when using them in such search spaces. Using priors allows for using maximum a posteriori estimate of these kernel parameters during model fitting.
For performance reasons number of inducing points should not be changed during Bayesian optimization. Hence, even if the initial dataset is smaller, we advise setting this to a higher number. By default inducing points are set to Sobol samples for the continuous search space, and simple random samples for discrete or mixed search spaces. This carries the risk that optimization gets stuck if they are not trainable, which calls for adaptive inducing point selection during the optimization. This functionality will be added to Trieste in future.
Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.
- Parameters
data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
classification – If a classification model is needed, this should be set to True, in which case a Bernoulli likelihood will be used. If a regression model is required, this should be set to False (default), in which case a Gaussian likelihood is used.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale).
likelihood_variance – Likelihood (noise) variance parameter can be optionally set to a certain value. If left unspecified (default), the noise variance is set to maintain the signal to noise ratio of value given by
SIGNAL_NOISE_RATIO_LIKELIHOOD
, where signal variance in the kernel is set to the empirical variance. This argument is ignored in the classification case.trainable_likelihood – If set to True likelihood parameter is set to be trainable. By default set to False. This argument is ignored in the classification case.
num_inducing_points – The number of inducing points can be optionally set to a certain value. If left unspecified (default), this number is set to either
NUM_INDUCING_POINTS_PER_DIM``*dimensionality of the search space or value given by ``MAX_NUM_INDUCING_POINTS
, whichever is smaller.trainable_inducing_points – If set to True inducing points will be set to be trainable. This option should be used with caution. By default set to False.
- Returns
An
SVGP
model.
-
build_vgp_classifier
(data: trieste.data.Dataset, search_space: trieste.space.SearchSpace, kernel_priors: bool = True, noise_free: bool = False, kernel_variance: Optional[float] = None) → gpflow.models.VGP[source]# Build a
VGP
binary classification model with sensible initial parameters and priors. We useMatern52
kernel andConstant
mean function in the model. We found the default configuration used here to work well in most situation, but it should not be taken as a universally good solution.We set priors for kernel hyperparameters by default in order to stabilize model fitting. We found the priors below to be highly effective for objective functions defined over the unit hypercube. They do seem to work for other search space sizes, but we advise caution when using them in such search spaces. Using priors allows for using maximum a posteriori estimate of these kernel parameters during model fitting. In the
noise_free
case we do not use prior for the kernel variance parameters.Note that although we scale parameters as a function of the size of the search space, ideally inputs should be normalised to the unit hypercube before building a model.
- Parameters
data – Dataset from the initial design, used for estimating the variance of observations.
search_space – Search space for performing Bayesian optimization, used for scaling the parameters.
kernel_priors – If set to True (default) priors are set for kernel parameters (variance and lengthscale). In the
noise_free
case kernel variance prior is not set.noise_free – If there is a prior information that the classification problem is a deterministic one, this should be set to True and kernel variance will be fixed to a higher default value
CLASSIFICATION_KERNEL_VARIANCE_NOISE_FREE
leading to sharper classification boundary. In this case prior for the kernel variance parameter is also not set. By default set to False.kernel_variance – Kernel variance parameter can be optionally set to a certain value. If left unspecified (default), the kernel variance is set to
CLASSIFICATION_KERNEL_VARIANCE_NOISE_FREE
in thenoise_free
case and toCLASSIFICATION_KERNEL_VARIANCE
otherwise.
- Returns
A
VGP
model.
-
class
InducingPointSelector
(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]# Bases:
abc.ABC
,Generic
[trieste.models.interfaces.ProbabilisticModelType
]This class provides functionality to update the inducing points of an inducing point-based model as the Bayesian optimization progresses.
The only constraint on subclasses of
InducingPointSelector
is that they preserve the shape of the inducing points so not to trigger expensive retracing.It can often be beneficial to change the inducing points during optimization, for example to allow the model to focus its limited modelling resources into promising areas of the space. See [VMA+21] for demonstrations of some of our
InducingPointSelectors
.- Parameters
search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
calculate_inducing_points
(self, current_inducing_points: trieste.types.TensorType, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType# Calculate the new inducing points given the existing inducing points.
If recalc_every_model_update is set to False then we only generate new inducing points for the first
calculate_inducing_points()
call, otherwise we just return the current inducing points.- Parameters
current_inducing_points – The current inducing points used by the model.
model – The sparse model.
dataset – The data from the observer.
- Returns
The new updated inducing points.
- Raises
NotImplementedError – If model has more than one set of inducing variables.
-
abstract
_recalculate_inducing_points
(self, M: int, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType# Method for calculating new inducing points given a model and dataset.
This method is to be implemented by all subclasses of
InducingPointSelector
.- Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer.
- Returns
The new updated inducing points.
-
class
KMeansInducingPointSelector
(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.interfaces.ProbabilisticModel
]An
InducingPointSelector
that chooses points as centroids of a K-means clustering of the training data.- Parameters
search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(self, M: int, model: trieste.models.interfaces.ProbabilisticModel, dataset: trieste.data.Dataset) → trieste.types.TensorType# Calculate M centroids from a K-means clustering of the training data.
If the clustering returns fewer than M centroids or if we have fewer than M training data, then we fill the remaining points with random samples across the search space.
- Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.
- Returns
The new updated inducing points.
- Raises
tf.errors.InvalidArgumentError – If
dataset
is empty.
-
class
RandomSubSampleInducingPointSelector
(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.interfaces.ProbabilisticModel
]An
InducingPointSelector
that chooses points at random from the training data.- Parameters
search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(self, M: int, model: trieste.models.interfaces.ProbabilisticModel, dataset: trieste.data.Dataset) → trieste.types.TensorType# Sample M points from the training data without replacement. If we require more inducing points than training data, then we fill the remaining points with random samples across the search space.
- Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.
- Returns
The new updated inducing points.
- Raises
tf.errors.InvalidArgumentError – If
dataset
is empty.
-
class
UniformInducingPointSelector
(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.interfaces.ProbabilisticModel
]An
InducingPointSelector
that chooses points sampled uniformly across the search space.- Parameters
search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(self, M: int, model: trieste.models.interfaces.ProbabilisticModel, dataset: trieste.data.Dataset) → trieste.types.TensorType# Sample M points. If search_space is a
Box
then we use a space-filling Sobol design to ensure high diversity.- Parameters
M – Desired number of inducing points.
model – The sparse model .
dataset – The data from the observer.
- Returns
The new updated inducing points.
-
class
GPflowPredictor
(optimizer: Optimizer | None = None)[source]# Bases:
trieste.models.interfaces.SupportsPredictJoint
,trieste.models.interfaces.SupportsGetKernel
,trieste.models.interfaces.SupportsGetObservationNoise
,trieste.models.interfaces.HasReparamSampler
,abc.ABC
A trainable wrapper for a GPflow Gaussian process model.
- Parameters
optimizer – The optimizer with which to train the model. Defaults to
Optimizer
withScipy
.
-
property
optimizer
(self) → trieste.models.optimizer.Optimizer# The optimizer with which to train the model.
-
create_posterior_cache
(self) → None# Create a posterior cache for fast sequential predictions. Note that this must happen at initialisation and after we ensure the model data is variable. Furthermore, the cache must be updated whenever the underlying model is changed.
-
update_posterior_cache
(self) → None# Update the posterior cache. This needs to be called whenever the underlying model is changed.
-
property
model
(self) → gpflow.models.GPModel# The underlying GPflow model.
-
predict
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
.This is essentially a convenience method for
predict_joint()
, where non-event dimensions ofquery_points
are all interpreted as broadcasting dimensions instead of batch dimensions, and the covariance is squeezed to remove redundant nesting.- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
predict_joint
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# - Parameters
query_points – The points at which to make predictions, of shape […, B, D].
- Returns
The mean and covariance of the joint marginal distribution at each batch of points in
query_points
. For a predictive distribution with event shape E, the mean will have shape […, B] + E, and the covariance shape […] + E + [B, B].
-
sample
(self, query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType# Return
num_samples
samples from the independent marginal distributions atquery_points
.- Parameters
query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.
- Returns
The samples. For a predictive distribution with event shape E, this has shape […, S, N] + E, where S is the number of samples.
-
predict_y
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
for the observations, including noise contributions.Note that this is not supported by all models.
- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
get_kernel
(self) → gpflow.kernels.Kernel# Return the kernel of the model.
- Returns
The kernel.
-
get_mean_function
(self) → gpflow.mean_functions.MeanFunction# Return the mean function of the model.
- Returns
The mean function.
-
get_observation_noise
(self) → trieste.types.TensorType# Return the variance of observation noise for homoscedastic likelihoods.
- Returns
The observation noise.
- Raises
NotImplementedError – If the model does not have a homoscedastic likelihood.
-
optimize
(self, dataset: trieste.data.Dataset) → None# Optimize the model with the specified dataset.
- Parameters
dataset – The data with which to optimize the model.
-
log
(self, dataset: Optional[trieste.data.Dataset] = None) → None# Log model-specific information at a given optimization step.
- Parameters
dataset – Optional data that can be used to log additional data-based model summaries.
-
reparam_sampler
(self, num_samples: int) → trieste.models.interfaces.ReparametrizationSampler[GPflowPredictor]# Return a reparametrization sampler providing num_samples samples.
- Returns
The reparametrization sampler.
-
class
GaussianProcessRegression
(model: gpflow.models.GPR, optimizer: Optimizer | None = None, num_kernel_samples: int = 10, num_rff_features: int = 1000, use_decoupled_sampler: bool = True)[source]# Bases:
trieste.models.gpflow.interface.GPflowPredictor
,trieste.models.interfaces.TrainableProbabilisticModel
,trieste.models.interfaces.FastUpdateModel
,trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints
,trieste.models.interfaces.SupportsGetInternalData
,trieste.models.interfaces.HasTrajectorySampler
A
TrainableProbabilisticModel
wrapper for a GPflowGPR
.As Bayesian optimization requires a large number of sequential predictions (i.e. when maximizing acquisition functions), rather than calling the model directly at prediction time we instead call the posterior objects built by these models. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by calling
update_posterior_cache()
(this happens automatically after calls toupdate()
or \(optimize\)).- Parameters
model – The GPflow model to wrap.
optimizer – The optimizer with which to train the model. Defaults to
Optimizer
withScipy
.num_kernel_samples – Number of randomly sampled kernels (for each kernel parameter) to evaluate before beginning model optimization. Therefore, for a kernel with p (vector-valued) parameters, we evaluate p * num_kernel_samples kernels.
num_rff_features – The number of random Fourier features used to approximate the kernel when calling
trajectory_sampler()
. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.use_decoupled_sampler – If True use a decoupled random Fourier feature sampler, else just use a random Fourier feature sampler. The decoupled sampler suffers less from overestimating variance and can typically get away with a lower num_rff_features.
-
property
model
(self) → gpflow.models.GPR# The underlying GPflow model.
-
predict_y
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
for the observations, including noise contributions.Note that this is not supported by all models.
- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
update
(self, dataset: trieste.data.Dataset) → None# Update the model given the specified
dataset
. Does not train the model.- Parameters
dataset – The data with which to update the model.
-
covariance_between_points
(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType# Compute the posterior covariance between sets of query points.
\[\Sigma_{12} = K_{12} - K_{x1}(K_{xx} + \sigma^2 I)^{-1}K_{x2}\]Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.
- Parameters
query_points_1 – Set of query points with shape […, N, D]
query_points_2 – Sets of query points with shape [M, D]
- Returns
Covariance matrix between the sets of query points with shape […, L, N, M] (L being the number of latent GPs = number of output dimensions)
-
optimize
(self, dataset: trieste.data.Dataset) → None# Optimize the model with the specified dataset.
For
GaussianProcessRegression
, we (optionally) try multiple randomly sampled kernel parameter configurations as well as the configuration specified when initializing the kernel. The best configuration is used as the starting point for model optimization.For trainable parameters constrained to lie in a finite interval (through a sigmoid bijector), we begin model optimization from the best of a random sample from these parameters’ acceptable domains.
For trainable parameters without constraints but with priors, we begin model optimization from the best of a random sample from these parameters’ priors.
For trainable parameters with neither priors nor constraints, we begin optimization from their initial values.
- Parameters
dataset – The data with which to optimize the model.
-
find_best_model_initialization
(self, num_kernel_samples: int) → None# Test num_kernel_samples models with sampled kernel parameters. The model’s kernel parameters are then set to the sample achieving maximal likelihood.
- Parameters
num_kernel_samples – Number of randomly sampled kernels to evaluate.
-
trajectory_sampler
(self) → trieste.models.interfaces.TrajectorySampler[GaussianProcessRegression]# Return a trajectory sampler. For
GaussianProcessRegression
, we build trajectories using a random Fourier feature approximation.At the moment only models with single latent GP are supported.
- Returns
The trajectory sampler.
- Raises
NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.
-
get_internal_data
(self) → trieste.data.Dataset# Return the model’s training data.
- Returns
The model’s training data.
-
conditional_predict_f
(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Returns the marginal GP distribution at query_points conditioned on both the model and some additional data, using exact formula. See [CGE14] (eqs. 8-10) for details.
- Parameters
query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]
- Returns
mean_qp_new: predictive mean at query_points, with shape […, M, L], and var_qp_new: predictive variance at query_points, with shape […, M, L]
-
conditional_predict_joint
(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Predicts the joint GP distribution at query_points conditioned on both the model and some additional data, using exact formula. See [CGE14] (eqs. 8-10) for details.
- Parameters
query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]
- Returns
mean_qp_new: predictive mean at query_points, with shape […, M, L], and cov_qp_new: predictive covariance between query_points, with shape […, L, M, M]
-
conditional_predict_f_sample
(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset, num_samples: int) → trieste.types.TensorType# Generates samples of the GP at query_points conditioned on both the model and some additional data.
- Parameters
query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]
num_samples – number of samples
- Returns
samples of f at query points, with shape […, num_samples, M, L]
-
conditional_predict_y
(self, query_points: trieste.types.TensorType, additional_data: trieste.data.Dataset) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Generates samples of y from the GP at query_points conditioned on both the model and some additional data.
- Parameters
query_points – Set of query points with shape [M, D]
additional_data – Dataset with query_points with shape […, N, D] and observations with shape […, N, L]
- Returns
predictive variance at query_points, with shape […, M, L], and predictive variance at query_points, with shape […, M, L]
-
class
SparseGaussianProcessRegression
(model: gpflow.models.SGPR, optimizer: Optimizer | None = None, num_rff_features: int = 1000, inducing_point_selector: Optional[trieste.models.gpflow.inducing_point_selectors.InducingPointSelector[SparseGaussianProcessRegression]] = None)[source]# Bases:
trieste.models.gpflow.interface.GPflowPredictor
,trieste.models.interfaces.TrainableProbabilisticModel
,trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints
,trieste.models.interfaces.SupportsGetInducingVariables
,trieste.models.interfaces.SupportsGetInternalData
,trieste.models.interfaces.HasTrajectorySampler
A
TrainableProbabilisticModel
wrapper for a GPflowSGPR
. At the moment we only support models with a single latent GP. This is due tocompute_qu
method inSGPR
that is used for computing covariance between query points and trajectory sampling, which at the moment works only for single latent GP.Similarly to our
GaussianProcessRegression
, ourSGPR
wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by callingupdate_posterior_cache()
(this happens automatically after calls toupdate()
or \(optimize\)).- Parameters
model – The GPflow model to wrap.
optimizer – The optimizer with which to train the model. Defaults to
Optimizer
withScipy
.num_rff_features – The number of random Fourier features used to approximate the kernel when calling
trajectory_sampler()
. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.inducing_point_selector – The (optional) desired inducing point selector that will update the underlying GPflow SGPR model’s inducing points as the optimization progresses.
- Raises
NotImplementedError (or ValueError) – If we try to use a model with invalid
num_rff_features
, or aninducing_point_selector
with a model that has more than one set of inducing points.
-
property
model
(self) → gpflow.models.SGPR# The underlying GPflow model.
-
predict_y
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
for the observations, including noise contributions.Note that this is not supported by all models.
- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
optimize
(self, dataset: trieste.data.Dataset) → None# Optimize the model with the specified dataset.
- Parameters
dataset – The data with which to optimize the model.
-
update
(self, dataset: trieste.data.Dataset) → None# Update the model given the specified
dataset
. Does not train the model.- Parameters
dataset – The data with which to update the model.
-
_update_inducing_variables
(self, new_inducing_points: trieste.types.TensorType) → None# When updating the inducing points of a model, we must also update the other inducing variables, i.e. q_mu and q_sqrt accordingly. The exact form of this update depends if we are using whitened representations of the inducing variables. See
_whiten_points()
for details.- Parameters
new_inducing_points – The desired values for the new inducing points.
- Raises
NotImplementedError – If we try to update the inducing variables of a model that has more than one set of inducing points.
-
get_inducing_variables
(self) → Tuple[Union[trieste.types.TensorType, list[trieste.types.TensorType]], trieste.types.TensorType, trieste.types.TensorType, bool]# Return the model’s inducing variables. The SGPR model does not have
q_mu
,q_sqrt
andwhiten
objects. We can usecompute_qu
method to obtainq_mu
andq_sqrt
, while the SGPR model does not use the whitened representation. Note that at the momentcompute_qu
works only for single latent GP and returnsq_sqrt
in a shape that is inconsistent with the SVGP model (hence we need to do modify its shape).- Returns
The inducing points (i.e. locations of the inducing variables), as a Tensor or a list of Tensors (when the model has multiple inducing points); a tensor containing the variational mean
q_mu
; a tensor containing the Cholesky decomposition of the variational covarianceq_sqrt
; and a bool denoting if we are using whitened or non-whitened representations.- Raises
NotImplementedError – If the model has more than one latent GP.
-
covariance_between_points
(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType# Compute the posterior covariance between sets of query points.
Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.
- Parameters
query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]
- Returns
Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)
-
trajectory_sampler
(self) → trieste.models.interfaces.TrajectorySampler[SparseGaussianProcessRegression]# Return a trajectory sampler. For
SparseGaussianProcessRegression
, we build trajectories using a decoupled random Fourier feature approximation. Note that this is available only for single output models.At the moment only models with single latent GP are supported.
- Returns
The trajectory sampler.
- Raises
NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.
-
get_internal_data
(self) → trieste.data.Dataset# Return the model’s training data.
- Returns
The model’s training data.
-
class
SparseVariational
(model: gpflow.models.SVGP, optimizer: Optimizer | None = None, num_rff_features: int = 1000, inducing_point_selector: Optional[trieste.models.gpflow.inducing_point_selectors.InducingPointSelector[SparseVariational]] = None)[source]# Bases:
trieste.models.gpflow.interface.GPflowPredictor
,trieste.models.interfaces.TrainableProbabilisticModel
,trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints
,trieste.models.interfaces.SupportsGetInducingVariables
,trieste.models.interfaces.HasTrajectorySampler
A
TrainableProbabilisticModel
wrapper for a GPflowSVGP
.Similarly to our
GaussianProcessRegression
, ourSVGP
wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by callingupdate_posterior_cache()
(this happens automatically after calls toupdate()
or \(optimize\)).- Parameters
model – The underlying GPflow sparse variational model.
optimizer – The optimizer with which to train the model. Defaults to
BatchOptimizer
withAdam
with batch size 100.num_rff_features – The number of random Fourier features used to approximate the kernel when performing decoupled Thompson sampling through its
trajectory_sampler()
. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.inducing_point_selector – The (optional) desired inducing_point_selector that will update the underlying GPflow sparse variational model’s inducing points as the optimization progresses.
- Raises
NotImplementedError – If we try to use an inducing_point_selector with a model that has more than one set of inducing points.
-
property
model
(self) → gpflow.models.SVGP# The underlying GPflow model.
-
predict_y
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
for the observations, including noise contributions.Note that this is not supported by all models.
- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
update
(self, dataset: trieste.data.Dataset) → None# Update the model given the specified
dataset
. Does not train the model.- Parameters
dataset – The data with which to update the model.
-
optimize
(self, dataset: trieste.data.Dataset) → None# Optimize the model with the specified dataset.
- Parameters
dataset – The data with which to optimize the model.
-
_update_inducing_variables
(self, new_inducing_points: trieste.types.TensorType) → None# When updating the inducing points of a model, we must also update the other inducing variables, i.e. q_mu and q_sqrt accordingly. The exact form of this update depends if we are using whitened representations of the inducing variables. See
_whiten_points()
for details.- Parameters
new_inducing_points – The desired values for the new inducing points.
- Raises
NotImplementedError – If we try to update the inducing variables of a model that has more than one set of inducing points.
-
get_inducing_variables
(self) → Tuple[Union[trieste.types.TensorType, list[trieste.types.TensorType]], trieste.types.TensorType, trieste.types.TensorType, bool]# Return the model’s inducing variables.
- Returns
The inducing points (i.e. locations of the inducing variables), as a Tensor or a list of Tensors (when the model has multiple inducing points); A tensor containing the variational mean q_mu; a tensor containing the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.
-
covariance_between_points
(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType# Compute the posterior covariance between sets of query points.
Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.
- Parameters
query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]
- Returns
Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)
-
trajectory_sampler
(self) → trieste.models.interfaces.TrajectorySampler[SparseVariational]# Return a trajectory sampler. For
SparseVariational
, we build trajectories using a decoupled random Fourier feature approximation.At the moment only models with single latent GP are supported.
- Returns
The trajectory sampler.
- Raises
NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.
-
class
VariationalGaussianProcess
(model: gpflow.models.VGP, optimizer: Optimizer | None = None, use_natgrads: bool = False, natgrad_gamma: Optional[float] = None, num_rff_features: int = 1000)[source]# Bases:
trieste.models.gpflow.interface.GPflowPredictor
,trieste.models.interfaces.TrainableProbabilisticModel
,trieste.models.gpflow.interface.SupportsCovarianceBetweenPoints
,trieste.models.interfaces.SupportsGetInducingVariables
,trieste.models.interfaces.HasTrajectorySampler
A
TrainableProbabilisticModel
wrapper for a GPflowVGP
.A Variational Gaussian Process (VGP) approximates the posterior of a GP using the multivariate Gaussian closest to the posterior of the GP by minimizing the KL divergence between approximated and exact posteriors. See [OA09] for details.
The VGP provides (approximate) GP modelling under non-Gaussian likelihoods, for example when fitting a classification model over binary data.
A whitened representation and (optional) natural gradient steps are used to aid model optimization.
Similarly to our
GaussianProcessRegression
, ourVGP
wrapper directly calls the posterior objects built by these models at prediction time. These posterior objects store the pre-computed Gram matrices, which can be reused to allow faster subsequent predictions. However, note that these posterior objects need to be updated whenever the underlying model is changed by callingupdate_posterior_cache()
(this happens automatically after calls toupdate()
or \(optimize\)).- Parameters
model – The GPflow
VGP
.optimizer – The optimizer with which to train the model. Defaults to
Optimizer
withScipy
.use_natgrads – If True then alternate model optimization steps with natural gradient updates. Note that natural gradients requires a
BatchOptimizer
wrapper withOptimizer
optimizer.num_rff_features – The number of random Fourier features used to approximate the kernel when performing decoupled Thompson sampling through its
trajectory_sampler()
. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
- Natgrad_gamma
Gamma parameter for the natural gradient optimizer.
- Raises
ValueError (or InvalidArgumentError) – If
model
’sq_sqrt
is not rank 3 or if attempting to combine natural gradients with aScipy
optimizer.
-
property
model
(self) → gpflow.models.VGP# The underlying GPflow model.
-
predict_y
(self, query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
for the observations, including noise contributions.Note that this is not supported by all models.
- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
update
(self, dataset: trieste.data.Dataset, *, jitter: float = DEFAULTS.JITTER) → None# Update the model given the specified
dataset
. Does not train the model.- Parameters
dataset – The data with which to update the model.
jitter – The size of the jitter to use when stabilizing the Cholesky decomposition of the covariance matrix.
-
optimize
(self, dataset: trieste.data.Dataset) → None# VariationalGaussianProcess
has a custom optimize method that (optionally) permits alternating between standard optimization steps (for kernel parameters) and natural gradient steps for the variational parameters (q_mu and q_sqrt). See [SEH18] for details. Using natural gradients can dramatically speed up model fitting, especially for ill-conditioned posteriors.If using natural gradients, our optimizer inherits the mini-batch behavior and number of optimization steps as the base optimizer specified when initializing the
VariationalGaussianProcess
.
-
get_inducing_variables
(self) → Tuple[trieste.types.TensorType, trieste.types.TensorType, trieste.types.TensorType, bool]# Return the model’s inducing variables. Note that GPflow’s VGP model is hard-coded to use the whitened representation.
- Returns
Tensors containing: the inducing points (i.e. locations of the inducing variables); the variational mean q_mu; the Cholesky decomposition of the variational covariance q_sqrt; and a bool denoting if we are using whitened or non-whitened representations.
-
trajectory_sampler
(self) → trieste.models.interfaces.TrajectorySampler[VariationalGaussianProcess]# Return a trajectory sampler. For
VariationalGaussianProcess
, we build trajectories using a decoupled random Fourier feature approximation.At the moment only models with single latent GP are supported.
- Returns
The trajectory sampler.
- Raises
NotImplementedError – If we try to use the sampler with a model that has more than one latent GP.
-
covariance_between_points
(self, query_points_1: trieste.types.TensorType, query_points_2: trieste.types.TensorType) → trieste.types.TensorType# Compute the posterior covariance between sets of query points.
Note that query_points_2 must be a rank 2 tensor, but query_points_1 can have leading dimensions.
- Parameters
query_points_1 – Set of query points with shape […, A, D]
query_points_2 – Sets of query points with shape [B, D]
- Returns
Covariance matrix between the sets of query points with shape […, L, A, B] (L being the number of latent GPs = number of output dimensions)
-
class
BatchReparametrizationSampler
(sample_size: int, model: trieste.models.interfaces.SupportsPredictJoint)[source]# Bases:
trieste.models.interfaces.ReparametrizationSampler
[trieste.models.interfaces.SupportsPredictJoint
]This sampler employs the reparameterization trick to approximate batches of samples from a
ProbabilisticModel
‘s predictive joint distribution as\[x \mapsto \mu(x) + \epsilon L(x)\]where \(L\) is the Cholesky factor s.t. \(LL^T\) is the covariance, and \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.
- Parameters
sample_size – The number of samples for each batch of points. Must be positive.
model – The model to sample from.
- Raises
ValueError (or InvalidArgumentError) – If
sample_size
is not positive.
-
sample
(self, at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType# Return approximate samples from the model specified at
__init__()
. Multiple calls tosample()
, for any givenBatchReparametrizationSampler
andat
, will produce the exact same samples. Calls tosample()
on differentBatchReparametrizationSampler
instances will produce different samples.- Parameters
at – Batches of query points at which to sample the predictive distribution, with shape […, B, D], for batches of size B of points of dimension D. Must have a consistent batch size across all calls to
sample()
for any givenBatchReparametrizationSampler
.jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.
- Returns
The samples, of shape […, S, B, L], where S is the sample_size, B the number of points per batch, and L the dimension of the model’s predictive distribution.
- Raises
ValueError (or InvalidArgumentError) – If any of the following are true: -
at
is a scalar. - The batch size B ofat
is not positive. - The batch size B ofat
differs from that of previous calls. -jitter
is negative.
-
class
DecoupledTrajectorySampler
(model: Union[FeatureDecompositionInducingPointModel, FeatureDecompositionInternalDataModel], num_features: int = 1000)[source]# Bases:
FeatureDecompositionTrajectorySampler
[Union
[FeatureDecompositionInducingPointModel
,FeatureDecompositionInternalDataModel
]]This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model using decoupled sampling. See [WBT+20] for an introduction to decoupled sampling. Currently we do not support models with multiple latent Gaussian processes.
Unlike our
RandomFourierFeatureTrajectorySampler
which uses a RFF decomposition to aprroximate the Gaussian process posterior, aDecoupledTrajectorySampler
only uses an RFF decomposition to approximate the Gausian process prior and instead using a cannonical decomposition to discretize the effect of updating the prior on the given data.In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation
\[\hat{f}(.) = \sum_{i=1}^L w_i\phi_i(.) + \sum_{j=1}^m v_jk(.,z_j)\]where \(\phi_i(.)\) and \(w_i\) are the Fourier features and their weights that discretize the prior. In contrast, k(.,z_j) and \(v_i\) are the cannonical features and their weights that discretize the data update.
The expression for \(v_i\) depends on if we are using an exact Gaussian process or a sparse approximations. See eq. (13) in [WBT+20] for details.
Note that if a model is both of
FeatureDecompositionInducingPointModel
type andFeatureDecompositionInternalDataModel
type,FeatureDecompositionInducingPointModel
will take a priority and inducing points will be used for computations rather than data.- Parameters
model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
- Raises
NotImplementedError – If the model is not of valid type.
-
_prepare_weight_sampler
(self) → Callable[[int], trieste.types.TensorType]# Prepare the sampler function that provides samples of the feature weights for both the RFF and cannonical feature functions, i.e. we return a function that takes in a batch size B and returns B samples for the weights of each of the L RFF features and N cannonical features.
-
class
IndependentReparametrizationSampler
(sample_size: int, model: trieste.models.interfaces.ProbabilisticModel)[source]# Bases:
trieste.models.interfaces.ReparametrizationSampler
[trieste.models.interfaces.ProbabilisticModel
]This sampler employs the reparameterization trick to approximate samples from a
ProbabilisticModel
‘s predictive distribution as\[x \mapsto \mu(x) + \epsilon \sigma(x)\]where \(\epsilon \sim \mathcal N (0, 1)\) is constant for a given sampler, thus ensuring samples form a continuous curve.
- Parameters
sample_size – The number of samples to take at each point. Must be positive.
model – The model to sample from.
- Raises
ValueError (or InvalidArgumentError) – If
sample_size
is not positive.
-
sample
(self, at: trieste.types.TensorType, *, jitter: float = DEFAULTS.JITTER) → trieste.types.TensorType# Return approximate samples from the model specified at
__init__()
. Multiple calls tosample()
, for any givenIndependentReparametrizationSampler
andat
, will produce the exact same samples. Calls tosample()
on differentIndependentReparametrizationSampler
instances will produce different samples.- Parameters
at – Where to sample the predictive distribution, with shape […, 1, D], for points of dimension D.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.
- Returns
The samples, of shape […, S, 1, L], where S is the sample_size and L is the number of latent model dimensions.
- Raises
ValueError (or InvalidArgumentError) – If
at
has an invalid shape orjitter
is negative.
-
class
RandomFourierFeatureTrajectorySampler
(model: FeatureDecompositionInternalDataModel, num_features: int = 1000)[source]# Bases:
FeatureDecompositionTrajectorySampler
[FeatureDecompositionInternalDataModel
]This class builds functions that approximate a trajectory sampled from an underlying Gaussian process model. For tractibility, the Gaussian process is approximated with a Bayesian Linear model across a set of features sampled from the Fourier feature decomposition of the model’s kernel. See [HernandezLHG14] for details. Currently we do not support models with multiple latent Gaussian processes.
In particular, we approximate the Gaussian processes’ posterior samples as the finite feature approximation
\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]where \(\phi_i\) are m Fourier features and \(\theta_i\) are feature weights sampled from a posterior distribution that depends on the feature values at the model’s datapoints.
Our implementation follows [HernandezLHG14], with our calculations differing slightly depending on properties of the problem. In particular, we used different calculation strategies depending on the number of considered features m and the number of data points n.
If \(m<n\) then we follow Appendix A of [HernandezLHG14] and calculate the posterior distribution for \(\theta\) following their Bayesian linear regression motivation, i.e. the computation revolves around an O(m^3) inversion of a design matrix.
If \(n<m\) then we use the kernel trick to recast computation to revolve around an O(n^3) inversion of a gram matrix. As well as being more efficient in early BO steps (where \(n<m\)), this second computation method allows much larger choices of m (as required to approximate very flexible kernels).
- Parameters
model – The model to sample from.
num_features – The number of features used to approximate the kernel. We use a default of 1000 as it typically perfoms well for a wide range of kernels. Note that very smooth kernels (e.g. RBF) can be well-approximated with fewer features.
- Raises
ValueError – If
dataset
is empty.
-
_prepare_weight_sampler
(self) → Callable[[int], trieste.types.TensorType]# Calculate the posterior of theta (the feature weights) for the RFFs, returning a function that takes in a batch size B and returns B samples for the weights of each of the RFF L features.
-
_prepare_theta_posterior_in_design_space
(self) → tensorflow_probability.distributions.MultivariateNormalTriL# Calculate the posterior of theta (the feature weights) in the design space. This distribution is a Gaussian
\[\theta \sim N(D^{-1}\Phi^Ty,D^{-1}\sigma^2)\]where the [m,m] design matrix \(D=(\Phi^T\Phi + \sigma^2I_m)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).
-
_prepare_theta_posterior_in_gram_space
(self) → tensorflow_probability.distributions.MultivariateNormalTriL# Calculate the posterior of theta (the feature weights) in the gram space.
\[\theta \sim N(\Phi^TG^{-1}y,I_m - \Phi^TG^{-1}\Phi)\]where the [n,n] gram matrix \(G=(\Phi\Phi^T + \sigma^2I_n)\) is defined for the [n,m] matrix of feature evaluations across the training data \(\Phi\) and observation noise variance \(\sigma^2\).
-
class
feature_decomposition_trajectory
(feature_functions: Callable[[trieste.types.TensorType], trieste.types.TensorType], weight_sampler: Callable[[int], trieste.types.TensorType], mean_function: Callable[[trieste.types.TensorType], trieste.types.TensorType])[source]# Bases:
trieste.models.interfaces.TrajectoryFunctionClass
An approximate sample from a Gaussian processes’ posterior samples represented as a finite weighted sum of features.
A trajectory is given by
\[\hat{f}(x) = \sum_{i=1}^m \phi_i(x)\theta_i\]where \(\phi_i\) are m feature functions and \(\theta_i\) are feature weights sampled from a posterior distribution.
The number of trajectories (i.e. batch size) is determined from the first call of the trajectory. In order to change the batch size, a new
TrajectoryFunction
must be built.- Parameters
feature_functions – Set of feature function.
weight_sampler – New sampler that generates feature weight samples.
mean_function – The underlying model’s mean function.
-
__call__
(self, x: trieste.types.TensorType) → trieste.types.TensorType# Call trajectory function.
-
update
(self, weight_sampler: Callable[[int], trieste.types.TensorType]) → None# Efficiently update the trajectory with a new weight distribution and resample its weights.
- Parameters
weight_sampler – New sampler that generates feature weight samples.
-
assert_data_is_compatible
(new_data: trieste.data.Dataset, existing_data: trieste.data.Dataset) → None[source]# Checks that new data is compatible with existing data.
- Parameters
new_data – New data.
existing_data – Existing data.
- Raises
ValueError – if trailing dimensions of the query point or observation differ.
-
check_optimizer
(optimizer: Union[trieste.models.optimizer.BatchOptimizer, trieste.models.optimizer.Optimizer]) → None[source]# Check that the optimizer for the GPflow models is using a correct optimizer wrapper.
Stochastic gradient descent based methods implemented in TensorFlow would not work properly without mini-batches and hence
BatchOptimizer
that prepares mini-batches and calls the optimizer iteratively needs to be used. GPflow’sScipy
optimizer on the other hand should use the non-batch wrapperOptimizer
.- Parameters
optimizer – An instance of the optimizer wrapper with the underlying optimizer.
- Raises
ValueError – If
Optimizer
is not usingBatchOptimizer
orScipy
is usingBatchOptimizer
.
-
randomize_hyperparameters
(object: gpflow.Module) → None[source]# Sets hyperparameters to random samples from their constrained domains or (if not constraints are available) their prior distributions.
- Parameters
object – Any gpflow Module.
-
squeeze_hyperparameters
(object: gpflow.Module, alpha: float = 0.01, epsilon: float = 1e-07) → None[source]# Squeezes the parameters to be strictly inside their range defined by the Sigmoid, or strictly greater than the limit defined by the Shift+Softplus. This avoids having Inf unconstrained values when the parameters are exactly at the boundary.
- Parameters
object – Any gpflow Module.
alpha – the proportion of the range with which to squeeze for the Sigmoid case
epsilon – the value with which to offset the shift for the Softplus case.
- Raises
ValueError – If
alpha
is not in (0,1) or epsilon <= 0