trieste.models.gpflow.inducing_point_selectors
#
This module is the home of Trieste’s functionality for choosing the inducing points
of sparse variational Gaussian processes (i.e. our SparseVariational
wrapper).
Module Contents#
-
class
InducingPointSelector
(recalc_every_model_update: bool = True)[source]# Bases:
abc.ABC
,Generic
[trieste.models.interfaces.ProbabilisticModelType
]This class provides functionality to update the inducing points of an inducing point-based model as the Bayesian optimization progresses.
The only constraint on subclasses of
InducingPointSelector
is that they preserve the shape of the inducing points so not to trigger expensive retracing.It can often be beneficial to change the inducing points during optimization, for example to allow the model to focus its limited modelling resources into promising areas of the space. See [VMA+21] for demonstrations of some of our
InducingPointSelectors
.- Parameters
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
calculate_inducing_points
(current_inducing_points: trieste.types.TensorType, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Calculate the new inducing points given the existing inducing points.
If recalc_every_model_update is set to False then we only generate new inducing points for the first
calculate_inducing_points()
call, otherwise we just return the current inducing points.- Parameters
current_inducing_points – The current inducing points used by the model.
model – The sparse model.
dataset – The data from the observer.
- Returns
The new updated inducing points.
- Raises
NotImplementedError – If model has more than one set of inducing variables.
-
abstract
_recalculate_inducing_points
(M: int, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Method for calculating new inducing points given a model and dataset.
This method is to be implemented by all subclasses of
InducingPointSelector
.- Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer.
- Returns
The new updated inducing points.
-
class
UniformInducingPointSelector
(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.gpflow.interface.GPflowPredictor
]An
InducingPointSelector
that chooses points sampled uniformly across the search space.- Parameters
search_space – The global search space over which the optimization is defined.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Sample M points. If search_space is a
Box
then we use a space-filling Sobol design to ensure high diversity.- Parameters
M – Desired number of inducing points.
model – The sparse model .
dataset – The data from the observer.
- Returns
The new updated inducing points.
-
class
RandomSubSampleInducingPointSelector
(recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.gpflow.interface.GPflowPredictor
]An
InducingPointSelector
that chooses points at random from the training data.- Parameters
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Sample M points from the training data without replacement. If we require more inducing points than training data, then we fill the remaining points with random samples across the search space.
- Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.
- Returns
The new updated inducing points.
- Raises
tf.errors.InvalidArgumentError – If
dataset
is empty.
-
class
KMeansInducingPointSelector
(recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.gpflow.interface.GPflowPredictor
]An
InducingPointSelector
that chooses points as centroids of a K-means clustering of the training data.- Parameters
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Calculate M centroids from a K-means clustering of the training data.
If the clustering returns fewer than M centroids or if we have fewer than M training data, then we fill the remaining points with random samples across the search space.
- Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.
- Returns
The new updated inducing points.
- Raises
tf.errors.InvalidArgumentError – If
dataset
is empty.
-
class
QualityFunction
[source]# Bases:
abc.ABC
A
QualityFunction
uses a model to measure the quality of each of the N query points in the provided dataset, returning shape [N].-
abstract
__call__
(model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Evaluate the quality of the data-points according to the model. :param model: The sparse model. :param dataset: The data from the observer. Must be populated. :return: The quality scores.
-
abstract
-
class
DPPInducingPointSelector
(quality_function: QualityFunction, recalc_every_model_update: bool = True)[source]# Bases:
InducingPointSelector
[trieste.models.gpflow.interface.GPflowPredictor
]An
InducingPointSelector
that follows [CZZ18] to get a greedy appoximation to the MAP estimate of the specified Determinantal Point Process (DPP).The DPP is defined through its diveristy-quality decomposition, i.e. its similarity kernel is just the kernel of the considered model and its quality scores come from the provided
QualityFunction
.- Parameters
quality_function – A function measuring the quality of each candidate inducing point.
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
_recalculate_inducing_points
(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# - Parameters
M – Desired number of inducing points.
model – The sparse model.
dataset – The data from the observer. Must be populated.
- Returns
The new updated inducing points.
- Raises
tf.errors.InvalidArgumentError – If
dataset
is empty.
-
class
UnitQualityFunction
[source]# Bases:
QualityFunction
A
QualityFunction
where all points are considered equal, i.e. using this quality function for inducing point allocation corresponds to allocating inducing points with the sole aim of minimizing predictive variance.-
__call__
(model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Evaluate the quality of the data-points according to the model. :param model: The sparse model. :param dataset: The data from the observer. Must be populated. :return: The quality scores.
-
-
class
ModelBasedImprovementQualityFunction
[source]# Bases:
QualityFunction
A
QualityFunction
where the quality of points are given by their expected improvement with respect to a conservative baseline. Expectations are according to the model from the previous BO step). See [MOP23] for details and justification.-
__call__
(model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Evaluate the quality of the data-points according to the model. :param model: The sparse model. :param dataset: The data from the observer. Must be populated. :return: The quality scores.
-
-
class
ConditionalVarianceReduction
(recalc_every_model_update: bool = True)[source]# Bases:
DPPInducingPointSelector
An
InducingPointSelector
that greedily chooses the points with maximal (conditional) predictive variance, see [BRVDW19].- Parameters
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
class
ConditionalImprovementReduction
(recalc_every_model_update: bool = True)[source]# Bases:
DPPInducingPointSelector
An
InducingPointSelector
that greedily chooses points with large predictive variance and that are likely to be in promising regions of the search space, see [MOP23].- Parameters
recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.
-
greedy_inference_dpp
(M: int, kernel: gpflow.kernels.Kernel, quality_scores: trieste.types.TensorType, dataset: trieste.data.Dataset) → trieste.types.TensorType[source]# Get a greedy approximation of the MAP estimate of the Determinantal Point Process (DPP) over
dataset
following the algorithm of [CZZ18]. Note that we are using the quality-diversity decomposition of a DPP, specifying both a similaritykernel
andquality_scores
.- Parameters
M – Desired set size.
kernel – The underlying kernel of the DPP.
quality_scores – The quality score of each item in
dataset
.
- Returns
The MAP estimate of the DPP.
- Raises
tf.errors.InvalidArgumentError – If
dataset
is empty or if the shape ofquality_scores
does not match that ofdataset.observations
.