`trieste.acquisition.function.active_learning`#

This module contains acquisition function builders and acquisition functions for Bayesian active learning.

Module Contents#

class PredictiveVariance(jitter: float = DEFAULTS.JITTER)[source]#

Bases: trieste.acquisition.interface.SingleModelAcquisitionBuilder[trieste.models.interfaces.SupportsPredictJoint]

Builder for the determinant of the predictive covariance matrix over the batch points. For a batch of size 1 it is the same as maximizing the predictive variance.

Parameters: jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

prepare_acquisition_function(self, model: trieste.models.interfaces.SupportsPredictJoint, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction [source]#

Parameters

model – The model.
dataset – Unused.

Returns

The determinant of the predictive function.

update_acquisition_function(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.interfaces.SupportsPredictJoint, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction [source]#

Parameters

function – The acquisition function to update.
model – The model.
dataset – Unused.

predictive_variance(model: trieste.models.interfaces.SupportsPredictJoint, jitter: float) → trieste.acquisition.interface.AcquisitionFunction [source]#

The predictive variance acquisition function for active learning, based on the determinant of the covariance (see [Mac92] for details). Note that the model needs to supply covariance of the joint marginal distribution, which can be expensive to compute.

Parameters

model – The model of the objective function.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.

class ExpectedFeasibility(threshold: float, alpha: float = 1, delta: int = 1)[source]#

Bases: trieste.acquisition.interface.SingleModelAcquisitionBuilder[trieste.models.ProbabilisticModel]

Builder for the Expected feasibility acquisition function for identifying a failure or feasibility region. It implements two related sampling strategies called bichon criterion ([BES+08]) and ranjan criterion ([RBM08]). The goal of both criteria is to sample points with a mean close to the threshold and a high variance.

Parameters

threshold – The failure or feasibility threshold.
alpha – The parameter which determines the neighbourhood around the estimated contour line as a percentage of the posterior variance in which to allocate new points. Defaults to value of 1.
delta – The parameter identifying which criterion is used, bichon for value of 1 (default) and ranjan for value of 2.

Raises

ValueError (or InvalidArgumentError) – If arguments are not a scalar, or alpha is not positive, or delta is not 1 or 2.

prepare_acquisition_function(self, model: trieste.models.ProbabilisticModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction [source]#

Parameters

model – The model.
dataset – Unused.

Returns

The expected feasibility function. This function will raise ValueError or InvalidArgumentError if used with a batch size greater than one.

update_acquisition_function(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.ProbabilisticModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction [source]#

Parameters

function – The acquisition function to update.
model – The model.
dataset – The data from the observer (optional).

Returns

The updated acquisition function.

bichon_ranjan_criterion(model: trieste.models.ProbabilisticModel, threshold: float, alpha: float, delta: int) → trieste.acquisition.interface.AcquisitionFunction [source]#

Return the bichon criterion ([BES+08]) and ranjan criterion ([RBM08]) used in Expected feasibility acquisition function for active learning of failure or feasibility regions.

The problem of identifying a failure or feasibility region of a function \(f\) can be formalized as estimating the excursion set, \(\Gamma^* = \{ x \in X: f(x) \ge T\}\), or estimating the contour line, \(C^* = \{ x \in X: f(x) = T\}\), for some threshold \(T\) (see [BGL+12] for more details).

It turns out that probabilistic models can be used as classifiers for identifying where excursion probability is larger than 1/2 and this idea is used to build many sequential sampling strategies. We follow [BGL+12] and use a formulation which provides a common expression for these two criteria:

\[\mathbb{E}[\max(0, (\alpha s(x))^\delta - |T - m(x)|^\delta)]\]

Here \(m(x)\) and \(s(x)\) are the mean and standard deviation of the predictive posterior of a probabilistic model. Bichon criterion is obtained when \(\delta = 1\) while ranjan criterion is obtained when \(\delta = 2\). \(\alpha>0\) is another parameter that acts as a percentage of standard deviation of the posterior around the current boundary estimate where we want to sample. The goal is to sample a point with a mean close to the threshold \(T\) and a high variance, so that the positive difference in the equation above is as large as possible.

Note that only batches of size 1 are allowed.

Parameters

model – The probabilistic model of the objective function.
threshold – The failure or feasibility threshold.
alpha – The parameter which determines the neighbourhood around the estimated contour line as a percentage of the posterior variance in which to allocate new points.
delta – The parameter identifying which criterion is used, bichon for value of 1 and ranjan for value of 2.

class IntegratedVarianceReduction(integration_points: trieste.types.TensorType, threshold: Optional[Union[float, Sequence[float], trieste.types.TensorType]] = None)[source]#

Bases: trieste.acquisition.interface.SingleModelAcquisitionBuilder[trieste.models.interfaces.FastUpdateModel]

Builder for the reduction of the integral of the predicted variance over the search space given a batch of query points.

Parameters

integration_points – set of points to integrate the prediction variance over.
threshold – either None, a float or a sequence of 1 or 2 float values.

prepare_acquisition_function(self, model: trieste.models.interfaces.FastUpdateModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction [source]#

Parameters

model – The model.
dataset – Unused.

Returns

The integral of the predictive variance.

update_acquisition_function(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.interfaces.FastUpdateModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction [source]#

Parameters

function – The acquisition function to update.
model – The model.
dataset – Unused.

class integrated_variance_reduction(model: trieste.models.interfaces.FastUpdateModel, integration_points: trieste.types.TensorType, threshold: Optional[Union[float, Sequence[float], trieste.types.TensorType]] = None)[source]#

Bases: trieste.acquisition.interface.AcquisitionFunctionClass

The reduction of the (weighted) average of the predicted variance over the integration points (a.k.a. Integrated Means Square Error or IMSE criterion). See [PGR+10] for details.

The criterion (to maximise) writes as:

\[\int_x (v_{old}(x) - v_{new}(x)) * weights(x),\]

where \(v_{old}(x)\) is the predictive variance of the model at \(x\), and \(v_{new}(x)\) is the updated predictive variance, given that the GP is further conditioned on the query points.

Note that since \(v_{old}(x)\) is constant w.r.t. the query points, this function only returns \(-\int_x v_{new}(x) * weights(x)\).

If no threshold is provided, the goal is to learn a globally accurate model, and the predictive variance (\(v_{new}\)) is used. Otherwise, learning is ‘targeted’ towards regions where the GP is close to particular values, and the variance is weighted by the posterior GP pdf evaluated at the threshold T (if a single value is given) or by the probability that the GP posterior belongs to the interval between the 2 thresholds T1 and T2 (note the slightly different parametrisation compared to [PGR+10] in that case).

This criterion allows batch size > 1. Note that the computational cost grows cubically with the batch size.

This criterion requires a method (conditional_predict_f) to compute the new predictive variance given that query points are added to the data.

Parameters

model – The model of the objective function.
integration_points – Points over which to integrate the objective prediction variance.
threshold – Either None, a float or a sequence of 1 or 2 float values. See class docs for details.

Raises

ValueError (or InvalidArgumentError) – If threshold has more than 2 values.

__call__(self, x: trieste.types.TensorType) → trieste.types.TensorType [source]#: Call acquisition function.

class BayesianActiveLearningByDisagreement(jitter: float = DEFAULTS.JITTER)[source]#

Bases: trieste.acquisition.interface.SingleModelAcquisitionBuilder[trieste.models.ProbabilisticModel]

Builder for the Bayesian Active Learning By Disagreement acquisition function defined in [HHGL11].

Parameters: jitter – The size of the jitter to avoid numerical problem caused by the log operation if variance is close to zero.

Parameters

model – The model.
dataset – Unused.

Returns

The determinant of the predictive function.

Parameters

function – The acquisition function to update.
model – The model.
dataset – Unused.

class bayesian_active_learning_by_disagreement(model: trieste.models.ProbabilisticModel, jitter: float)[source]#

Bases: trieste.acquisition.interface.AcquisitionFunctionClass

An AcquisitionFunctionClass is an acquisition function represented using a class rather than as a standalone function. Using a class to represent an acquisition function makes it easier to update it, to avoid having to retrace the function on every call.

The Bayesian active learning by disagrement acquisition function computes the information gain of the predictive entropy [HHGL11]. the acquisiton function is calculated by:

\[\mathrm{h}\left(\Phi\left(\frac{\mu_{\boldsymbol{x}, \mathcal{D}}} {\sqrt{\sigma_{\boldsymbol{x}, \mathcal{D}}^{2}+1}}\right)\right) -\frac{C \exp \left(-\frac{\mu_{\boldsymbol{x}, \mathcal{D}}^{2}} {2\left(\sigma_{\boldsymbol{w}, \mathcal{D}}^{+C^{2}}\right)}\right)} {\sqrt{\sigma_{\boldsymbol{x}, \mathcal{D}}^{2}+C^{2}}}\]

Here \(\mathrm{h}(p)\) is defined as:

\[\mathrm{h}(p)=-p \log p-(1-p) \log (1-p)\]

This acquisition function is intended to use for Binary Gaussian Process Classification model with Bernoulli likelihood. It is designed for VGP but other Gaussian approximation of the posterior can be used. SVGP for instance, or some other model that is not currently supported by Trieste. Integrating over nuisance parameters is currently not supported (see equation 6 of the paper).

Parameters

model – The model of the objective function.
jitter – The size of the jitter to avoid numerical problem caused by the log operation if variance is close to zero.

Returns

The Bayesian Active Learning By Disagreement acquisition function.

__call__(self, x: trieste.types.TensorType) → trieste.types.TensorType [source]#: Call acquisition function.

trieste.acquisition.function.active_learning#

Module Contents#

`trieste.acquisition.function.active_learning`#