trieste.models.gpflow.inducing_point_selectors#

This module is the home of Trieste’s functionality for choosing the inducing points of sparse variational Gaussian processes (i.e. our SparseVariational wrapper).

Module Contents#

class InducingPointSelector(recalc_every_model_update: bool = True)[source]#

Bases: abc.ABC, Generic[trieste.models.interfaces.ProbabilisticModelType]

This class provides functionality to update the inducing points of an inducing point-based model as the Bayesian optimization progresses.

The only constraint on subclasses of InducingPointSelector is that they preserve the shape of the inducing points so not to trigger expensive retracing.

It can often be beneficial to change the inducing points during optimization, for example to allow the model to focus its limited modelling resources into promising areas of the space. See [VMA+21] for demonstrations of some of our InducingPointSelectors.

Parameters

recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

calculate_inducing_points(current_inducing_points: trieste.types.TensorType, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Calculate the new inducing points given the existing inducing points.

If recalc_every_model_update is set to False then we only generate new inducing points for the first calculate_inducing_points() call, otherwise we just return the current inducing points.

Parameters
  • current_inducing_points – The current inducing points used by the model.

  • model – The sparse model.

  • dataset – The data from the observer.

Returns

The new updated inducing points.

Raises

NotImplementedError – If model has more than one set of inducing variables.

abstract _recalculate_inducing_points(M: int, model: trieste.models.interfaces.ProbabilisticModelType, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Method for calculating new inducing points given a model and dataset.

This method is to be implemented by all subclasses of InducingPointSelector.

Parameters
  • M – Desired number of inducing points.

  • model – The sparse model.

  • dataset – The data from the observer.

Returns

The new updated inducing points.

class UniformInducingPointSelector(search_space: trieste.space.SearchSpace, recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that chooses points sampled uniformly across the search space.

Parameters
  • search_space – The global search space over which the optimization is defined.

  • recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Sample M points. If search_space is a Box then we use a space-filling Sobol design to ensure high diversity.

Parameters
  • M – Desired number of inducing points.

  • model – The sparse model .

  • dataset – The data from the observer.

Returns

The new updated inducing points.

class RandomSubSampleInducingPointSelector(recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that chooses points at random from the training data.

Parameters

recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Sample M points from the training data without replacement. If we require more inducing points than training data, then we fill the remaining points with random samples across the search space.

Parameters
  • M – Desired number of inducing points.

  • model – The sparse model.

  • dataset – The data from the observer. Must be populated.

Returns

The new updated inducing points.

Raises

tf.errors.InvalidArgumentError – If dataset is empty.

class KMeansInducingPointSelector(recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that chooses points as centroids of a K-means clustering of the training data.

Parameters

recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Calculate M centroids from a K-means clustering of the training data.

If the clustering returns fewer than M centroids or if we have fewer than M training data, then we fill the remaining points with random samples across the search space.

Parameters
  • M – Desired number of inducing points.

  • model – The sparse model.

  • dataset – The data from the observer. Must be populated.

Returns

The new updated inducing points.

Raises

tf.errors.InvalidArgumentError – If dataset is empty.

class QualityFunction[source]#

Bases: abc.ABC

A QualityFunction uses a model to measure the quality of each of the N query points in the provided dataset, returning shape [N].

abstract __call__(model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Evaluate the quality of the data-points according to the model. :param model: The sparse model. :param dataset: The data from the observer. Must be populated. :return: The quality scores.

class DPPInducingPointSelector(quality_function: QualityFunction, recalc_every_model_update: bool = True)[source]#

Bases: InducingPointSelector[trieste.models.gpflow.interface.GPflowPredictor]

An InducingPointSelector that follows [CZZ18] to get a greedy appoximation to the MAP estimate of the specified Determinantal Point Process (DPP).

The DPP is defined through its diveristy-quality decomposition, i.e. its similarity kernel is just the kernel of the considered model and its quality scores come from the provided QualityFunction.

Parameters
  • quality_function – A function measuring the quality of each candidate inducing point.

  • recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

_recalculate_inducing_points(M: int, model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#
Parameters
  • M – Desired number of inducing points.

  • model – The sparse model.

  • dataset – The data from the observer. Must be populated.

Returns

The new updated inducing points.

Raises

tf.errors.InvalidArgumentError – If dataset is empty.

class UnitQualityFunction[source]#

Bases: QualityFunction

A QualityFunction where all points are considered equal, i.e. using this quality function for inducing point allocation corresponds to allocating inducing points with the sole aim of minimizing predictive variance.

__call__(model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Evaluate the quality of the data-points according to the model. :param model: The sparse model. :param dataset: The data from the observer. Must be populated. :return: The quality scores.

class ModelBasedImprovementQualityFunction[source]#

Bases: QualityFunction

A QualityFunction where the quality of points are given by their expected improvement with respect to a conservative baseline. Expectations are according to the model from the previous BO step). See [MOP23] for details and justification.

__call__(model: trieste.models.gpflow.interface.GPflowPredictor, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Evaluate the quality of the data-points according to the model. :param model: The sparse model. :param dataset: The data from the observer. Must be populated. :return: The quality scores.

class ConditionalVarianceReduction(recalc_every_model_update: bool = True)[source]#

Bases: DPPInducingPointSelector

An InducingPointSelector that greedily chooses the points with maximal (conditional) predictive variance, see [BRVDW19].

Parameters

recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

class ConditionalImprovementReduction(recalc_every_model_update: bool = True)[source]#

Bases: DPPInducingPointSelector

An InducingPointSelector that greedily chooses points with large predictive variance and that are likely to be in promising regions of the search space, see [MOP23].

Parameters

recalc_every_model_update – If True then recalculate the inducing points for each model update, otherwise just recalculate on the first call.

greedy_inference_dpp(M: int, kernel: gpflow.kernels.Kernel, quality_scores: trieste.types.TensorType, dataset: trieste.data.Dataset)trieste.types.TensorType[source]#

Get a greedy approximation of the MAP estimate of the Determinantal Point Process (DPP) over dataset following the algorithm of [CZZ18]. Note that we are using the quality-diversity decomposition of a DPP, specifying both a similarity kernel and quality_scores.

Parameters
  • M – Desired set size.

  • kernel – The underlying kernel of the DPP.

  • quality_scores – The quality score of each item in dataset.

Returns

The MAP estimate of the DPP.

Raises

tf.errors.InvalidArgumentError – If dataset is empty or if the shape of quality_scores does not match that of dataset.observations.