An overview of Trieste types#
Trieste is dedicated to Bayesian optimization, the process of finding the optimal values of an expensive, black-box objective function by employing probabilistic models over observations. This notebook explains how the different parts of this process are represented by different types in the code, and how these types can be extended.
Key types#
The following types represent the key concepts in Trieste. For a full listing of all the types in Trieste, see the API reference.
Observer
#
The Observer
type definition represents the black box objective function. Observers are functions that accept query points and return datasets that contain the observations. Observations are either a single objective value that we wish to optimize, or a dictionary of mutiple tagged values that must be combined somehow, for example an objective and an inequality constraint. Objective values can be either single or multi-dimensional (see multi-objective
optimization).
Dataset
#
The Dataset
container class represents the query points and observations from a single observer. Observers with multiple observations are represented by a dictionary of multiple tagged Dataset
s.
ProbabilisticModel
#
The ProbabilisticModel
protocol represents any probabilistic model used to model observations. Like for Dataset
, observers with multiple observations are modelled by a dictionary of multiple tagged models.
At it simplest, a ProbabilisticModel
is anything that implements a predict
and sample
method. However, many algorithms in Trieste depend on models with additional features, which are represented by the various subclasses of ProbabilisticModel
. The standard Bayesian optimizer uses TrainableProbabilisticModel
models, which also implement an update
method (to update the model structure when new data is added to the training set) and an optimize
method (to optimize the
model training loss). Specific acuqisition functions may require other features, represented by classes like SupportsPredictJoint
(ability to predict the joint distribution at several inputs) and SupportsGetObservationNoise
(ability to predict the observation noise variance). Since these are defined as protocols, it is possible to define and depend on the intersections of different model types (e.g. only support models that are both SupportsPredictJoint
and
SupportsGetObservationNoise
).
Multiple models can also be combined into a single ModelStack
model that combines their outputs for prediction and sampling. This can be useful when modelling multi-objective observations with independent single-output models. There are also constructors like TrainableModelStack
and PredictJointModelStack
that combine specific types of model for use with code that requires that type, delegating the relevant methods where appropriate.
SearchSpace
#
The SearchSpace
base class represents the domain over which the objective function is to be optimized. Spaces can currently be either continuous Box
spaces, discrete DiscreteSearchSpace
spaces, or a TaggedProductSearchSpace
product of multipe spaces. All search spaces expose their dimensions, bounds and a sampler.
AcquisitionRule
#
The AcquisitionRule
base class represents a routine for selecting new query points during a Bayesian optimization loop (via an acquire
method). It is generic on three types:
ResultType: the output of the rule, typically this is just tensors representing the query points. However, it can also be functions that accept some acquisition state and return the query points with a new state.
SearchSpaceType: the type of the search space; any optimizer that the rule uses must support this.
ProbabilisticModelType: the type of the models; any acquisition functions or samplers that the rule uses must support this.
Examples of rules include:
EfficientGlobalOptimization
(EGO) is the most commonly used rule, and uses acquisition functions and optimizers to select new query points.AsynchronousOptimization
is similar to EGO but uses acquisition state to keep track of points whose observations are still pending.DiscreteThompsonSampling
uses Thompson samplers rather than acquisition functions to select new query points.
AcquisitionFunction
and AcquisitionFunctionBuilder
#
The AcquisitionFunction
type definition represents any acquisition function: that is, a function that maps a set of query points to a single value that describes how useful it would be evaluate all these points together.
The AcquisitionFunctionBuilder
base class, meanwhile, represents something that builds and updates acquisition functions. At the start of the Bayesian optimization, the builder’s prepare_acquisition_function
method is called by the acquisition rule to create an acquisition function from the current observations and probabilistic models. To avoid unnecessary tensorflow compilations, most builders also define an update_acquisition_function
method for updating the function using the
updated observations and models. (The ones that don’t instead generate a new acquisition function when necessary.)
Acquisition functions that support only one probabilistic model are more easily defined using the SingleModelAcquisitionBuilder
convenience class, which avoids having to deal with dictionaries.
Acquisition functions that are suitable for greedily building batches of points can be defined using GreedyAcquisitionFunctionBuilder
(or SingleModelGreedyAcquisitionBuilder
) using a prepare_acquisition_function
method that also accepts the points already chosen to be in the current batch.
AcquisitionOptimizer
#
The AcquisitionOptimizer
type definition represents an optimizer function that maximizes an acquisition function over a search space. Trieste provides a generate_continuous_optimizer
function for generating gradient-based optimizers for continuous (or hybrid) spaces, an optimize_discrete
function for optimizing discrete spaces, and automatic_optimizer_selector
for quickly selecting an appropriate optimizer.
BayesianOptimizer
and AskTellOptimizer
#
The BayesianOptimizer
and AskTellOptimizer
classes are the two Bayesian optimization interfaces provided by Trieste. Both classes provide monitoring using TensorBoard.
BayesianOptimizer
exposes an optimize
method for running a Bayesian optimization loop with given initial datasets and models, and a given number of steps (and an optional early stop callback).
AskTellOptimizer
provides greater control over the loop, by providing separate ask
and tell
steps for suggesting query points and updating the models with new observations.
Extending the key types#
[1]:
from __future__ import annotations
from typing import Optional
import tensorflow as tf
from trieste.types import TensorType
This section explains how to define new observers, model types and acqusition functions.
New observers#
Defining an observer with a single observation can be as simple as defining a function that returns that observation:
[2]:
from trieste.objectives.utils import mk_observer
def simple_quadratic(x: TensorType) -> TensorType:
"A trivial quadratic function over :math:`[0, 1]^2`."
return -tf.math.reduce_sum(x, axis=-1, keepdims=True) ** 2
observer = mk_observer(simple_quadratic)
observer(tf.constant([[0, 1], [1, 1]], dtype=tf.float64))
[2]:
Dataset(query_points=<tf.Tensor: shape=(2, 2), dtype=float64, numpy=
array([[0., 1.],
[1., 1.]])>, observations=<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[-1.],
[-4.]])>)
A multi-observation observer can similarly be constructed from multiple functions:
[3]:
from trieste.objectives.utils import mk_multi_observer
def simple_constraint(x: TensorType) -> TensorType:
"A trivial constraints over :math:`[0, 1]^2`."
return tf.math.reduce_min(x, axis=-1, keepdims=True)
multiobserver = mk_multi_observer(
OBJECTIVE=simple_quadratic, CONSTRAINT=simple_constraint
)
multiobserver(tf.constant([[0, 1], [1, 1]], dtype=tf.float64))
[3]:
{'OBJECTIVE': Dataset(query_points=<tf.Tensor: shape=(2, 2), dtype=float64, numpy=
array([[0., 1.],
[1., 1.]])>, observations=<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[-1.],
[-4.]])>),
'CONSTRAINT': Dataset(query_points=<tf.Tensor: shape=(2, 2), dtype=float64, numpy=
array([[0., 1.],
[1., 1.]])>, observations=<tf.Tensor: shape=(2, 1), dtype=float64, numpy=
array([[0.],
[1.]])>)}
Note however that observers are not restricted to returning datasets containing precisely the observed query points: if need be, they can also return arbitrary Datasets with missing or additional points and observations.
New probabilistic model types#
Defining a new probabilistic model type simply involves writing a class that implements all the relevant methods (at the very least predict
and sample
). For clarity, it is best to also explicitly inherit from the supported feature protocols.
[4]:
from trieste.models.interfaces import (
TrainableProbabilisticModel,
HasReparamSampler,
ReparametrizationSampler,
SupportsGetObservationNoise,
)
class GizmoModel(
TrainableProbabilisticModel, HasReparamSampler, SupportsGetObservationNoise
):
"Made-up trainable model type that supports both reparam_sampler and get_observation_noise."
def predict(
self, query_points: TensorType
) -> tuple[TensorType, TensorType]:
raise NotImplementedError
def reparam_sampler(
self, num_samples: int
) -> ReparametrizationSampler[GizmoModel]:
raise NotImplementedError
... # sample, update, optimize, get_observation_noise
If the new model type has an additional feature on which you’d like to depend, e.g. in a new acquisition function, then you can define that feature as a protocol. Marking it runtime_checkable will alow you to check for the feature elsewhere in your code too.
[5]:
from trieste.models.interfaces import ProbabilisticModel
from typing_extensions import Protocol, runtime_checkable
@runtime_checkable
class HasGizmo(ProbabilisticModel, Protocol):
"A probabilistic model that has a 'gizmo' method."
def gizmo(self) -> int:
"A 'gizmo' method."
raise NotImplementedError
If your acquisition function depends on a combination of features, then you can define an intersection protocol and use it when defining the acquisition function:
[6]:
@runtime_checkable
class HasGizmoReparamSamplerAndObservationNoise(
HasGizmo, HasReparamSampler, SupportsGetObservationNoise, Protocol
):
"""A model that supports both gizmo, reparam_sampler and get_observation_noise."""
pass
New acquisition function builders#
To define a new acquisition function builder, you simply need to define a class with a prepare_acquisition_function
method that returns an AcquisitionFunction
. If the acquisition function depends on just one model/dataset (as is often the case) then you can define it as a SingleModelAcquisitionBuilder
; if it depends on more than one (e.g. both an objective and a constraint) then you must define it as a ModelAcquisitionBuilder
instead. You can also specify (in brackets) the type
of probabilistic models that the acquisition function supports: e.g. a SingleModelAcquisitionBuilder[HasReparamSampler]
only supports models with a reparametrization sampler. This allows the type checker to warn you if you try to use an incompatible model type.
[7]:
from trieste.acquisition import (
AcquisitionFunction,
SingleModelAcquisitionBuilder,
)
from trieste.data import Dataset
class ProbabilityOfValidity(SingleModelAcquisitionBuilder[ProbabilisticModel]):
def prepare_acquisition_function(
self, model: ProbabilisticModel, dataset: Optional[Dataset] = None
) -> AcquisitionFunction:
def acquisition(at: TensorType) -> TensorType:
mean, _ = model.predict_y(tf.squeeze(at, -2))
return mean
return acquisition
For efficiency, it usually makes sense to compile the generated acquisition function into a TensorFlow graph using tf.function
. Furthermore, to avoid generating (and compiling) a new acquisition function on each Bayesian optimization loop, you can define an update_acquisition_function
method that can instead update the previously generated acquisition function using the new models and data. This may involve updating the acquisition function’s internal state (which you should store in
tf.Variable
s), though if the function has no internal state then it is suficient to simply return the old function unchanged.
[8]:
class ProbabilityOfValidity2(SingleModelAcquisitionBuilder[ProbabilisticModel]):
def prepare_acquisition_function(
self, model: ProbabilisticModel, dataset: Optional[Dataset] = None
) -> AcquisitionFunction:
@tf.function
def acquisition(at: TensorType) -> TensorType:
mean, _ = model.predict_y(tf.squeeze(at, -2))
return mean
return acquisition
def update_acquisition_function(
self,
function: AcquisitionFunction,
model: ProbabilisticModel,
dataset: Optional[Dataset] = None,
) -> AcquisitionFunction:
return function # no need to update anything