trieste.acquisition.function.active_learning
#
This module contains acquisition function builders and acquisition functions for Bayesian active learning.
Module Contents#
-
class
PredictiveVariance
(jitter: float = DEFAULTS.JITTER)[source]# Bases:
trieste.acquisition.interface.SingleModelAcquisitionBuilder
[trieste.models.interfaces.SupportsPredictJoint
]Builder for the determinant of the predictive covariance matrix over the batch points. For a batch of size 1 it is the same as maximizing the predictive variance.
- Parameters
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.
-
prepare_acquisition_function
(self, model: trieste.models.interfaces.SupportsPredictJoint, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
model – The model.
dataset – Unused.
- Returns
The determinant of the predictive function.
-
update_acquisition_function
(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.interfaces.SupportsPredictJoint, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
function – The acquisition function to update.
model – The model.
dataset – Unused.
-
predictive_variance
(model: trieste.models.interfaces.SupportsPredictJoint, jitter: float) → trieste.acquisition.interface.AcquisitionFunction[source]# The predictive variance acquisition function for active learning, based on the determinant of the covariance (see [Mac92] for details). Note that the model needs to supply covariance of the joint marginal distribution, which can be expensive to compute.
- Parameters
model – The model of the objective function.
jitter – The size of the jitter to use when stabilising the Cholesky decomposition of the covariance matrix.
-
class
ExpectedFeasibility
(threshold: float, alpha: float = 1, delta: int = 1)[source]# Bases:
trieste.acquisition.interface.SingleModelAcquisitionBuilder
[trieste.models.ProbabilisticModel
]Builder for the Expected feasibility acquisition function for identifying a failure or feasibility region. It implements two related sampling strategies called bichon criterion ([BES+08]) and ranjan criterion ([RBM08]). The goal of both criteria is to sample points with a mean close to the threshold and a high variance.
- Parameters
threshold – The failure or feasibility threshold.
alpha – The parameter which determines the neighbourhood around the estimated contour line as a percentage of the posterior variance in which to allocate new points. Defaults to value of 1.
delta – The parameter identifying which criterion is used, bichon for value of 1 (default) and ranjan for value of 2.
- Raises
ValueError (or InvalidArgumentError) – If arguments are not a scalar, or alpha is not positive, or delta is not 1 or 2.
-
prepare_acquisition_function
(self, model: trieste.models.ProbabilisticModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
model – The model.
dataset – Unused.
- Returns
The expected feasibility function. This function will raise
ValueError
orInvalidArgumentError
if used with a batch size greater than one.
-
update_acquisition_function
(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.ProbabilisticModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
function – The acquisition function to update.
model – The model.
dataset – The data from the observer (optional).
- Returns
The updated acquisition function.
-
bichon_ranjan_criterion
(model: trieste.models.ProbabilisticModel, threshold: float, alpha: float, delta: int) → trieste.acquisition.interface.AcquisitionFunction[source]# Return the bichon criterion ([BES+08]) and ranjan criterion ([RBM08]) used in Expected feasibility acquisition function for active learning of failure or feasibility regions.
The problem of identifying a failure or feasibility region of a function \(f\) can be formalized as estimating the excursion set, \(\Gamma^* = \{ x \in X: f(x) \ge T\}\), or estimating the contour line, \(C^* = \{ x \in X: f(x) = T\}\), for some threshold \(T\) (see [BGL+12] for more details).
It turns out that probabilistic models can be used as classifiers for identifying where excursion probability is larger than 1/2 and this idea is used to build many sequential sampling strategies. We follow [BGL+12] and use a formulation which provides a common expression for these two criteria:
\[\mathbb{E}[\max(0, (\alpha s(x))^\delta - |T - m(x)|^\delta)]\]Here \(m(x)\) and \(s(x)\) are the mean and standard deviation of the predictive posterior of a probabilistic model. Bichon criterion is obtained when \(\delta = 1\) while ranjan criterion is obtained when \(\delta = 2\). \(\alpha>0\) is another parameter that acts as a percentage of standard deviation of the posterior around the current boundary estimate where we want to sample. The goal is to sample a point with a mean close to the threshold \(T\) and a high variance, so that the positive difference in the equation above is as large as possible.
Note that only batches of size 1 are allowed.
- Parameters
model – The probabilistic model of the objective function.
threshold – The failure or feasibility threshold.
alpha – The parameter which determines the neighbourhood around the estimated contour line as a percentage of the posterior variance in which to allocate new points.
delta – The parameter identifying which criterion is used, bichon for value of 1 and ranjan for value of 2.
-
class
IntegratedVarianceReduction
(integration_points: trieste.types.TensorType, threshold: Optional[Union[float, Sequence[float], trieste.types.TensorType]] = None)[source]# Bases:
trieste.acquisition.interface.SingleModelAcquisitionBuilder
[trieste.models.interfaces.FastUpdateModel
]Builder for the reduction of the integral of the predicted variance over the search space given a batch of query points.
- Parameters
integration_points – set of points to integrate the prediction variance over.
threshold – either None, a float or a sequence of 1 or 2 float values.
-
prepare_acquisition_function
(self, model: trieste.models.interfaces.FastUpdateModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
model – The model.
dataset – Unused.
- Returns
The integral of the predictive variance.
-
update_acquisition_function
(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.interfaces.FastUpdateModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
function – The acquisition function to update.
model – The model.
dataset – Unused.
-
class
integrated_variance_reduction
(model: trieste.models.interfaces.FastUpdateModel, integration_points: trieste.types.TensorType, threshold: Optional[Union[float, Sequence[float], trieste.types.TensorType]] = None)[source]# Bases:
trieste.acquisition.interface.AcquisitionFunctionClass
The reduction of the (weighted) average of the predicted variance over the integration points (a.k.a. Integrated Means Square Error or IMSE criterion). See [PGR+10] for details.
The criterion (to maximise) writes as:
\[\int_x (v_{old}(x) - v_{new}(x)) * weights(x),\]where \(v_{old}(x)\) is the predictive variance of the model at \(x\), and \(v_{new}(x)\) is the updated predictive variance, given that the GP is further conditioned on the query points.
Note that since \(v_{old}(x)\) is constant w.r.t. the query points, this function only returns \(-\int_x v_{new}(x) * weights(x)\).
If no threshold is provided, the goal is to learn a globally accurate model, and the predictive variance (\(v_{new}\)) is used. Otherwise, learning is ‘targeted’ towards regions where the GP is close to particular values, and the variance is weighted by the posterior GP pdf evaluated at the threshold T (if a single value is given) or by the probability that the GP posterior belongs to the interval between the 2 thresholds T1 and T2 (note the slightly different parametrisation compared to [PGR+10] in that case).
This criterion allows batch size > 1. Note that the computational cost grows cubically with the batch size.
This criterion requires a method (conditional_predict_f) to compute the new predictive variance given that query points are added to the data.
- Parameters
model – The model of the objective function.
integration_points – Points over which to integrate the objective prediction variance.
threshold – Either None, a float or a sequence of 1 or 2 float values. See class docs for details.
- Raises
ValueError (or InvalidArgumentError) – If
threshold
has more than 2 values.
-
__call__
(self, x: trieste.types.TensorType) → trieste.types.TensorType[source]# Call acquisition function.
-
class
BayesianActiveLearningByDisagreement
(jitter: float = DEFAULTS.JITTER)[source]# Bases:
trieste.acquisition.interface.SingleModelAcquisitionBuilder
[trieste.models.ProbabilisticModel
]Builder for the Bayesian Active Learning By Disagreement acquisition function defined in [HHGL11].
- Parameters
jitter – The size of the jitter to avoid numerical problem caused by the log operation if variance is close to zero.
-
prepare_acquisition_function
(self, model: trieste.models.ProbabilisticModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
model – The model.
dataset – Unused.
- Returns
The determinant of the predictive function.
-
update_acquisition_function
(self, function: trieste.acquisition.interface.AcquisitionFunction, model: trieste.models.ProbabilisticModel, dataset: Optional[trieste.data.Dataset] = None) → trieste.acquisition.interface.AcquisitionFunction[source]# - Parameters
function – The acquisition function to update.
model – The model.
dataset – Unused.
-
class
bayesian_active_learning_by_disagreement
(model: trieste.models.ProbabilisticModel, jitter: float)[source]# Bases:
trieste.acquisition.interface.AcquisitionFunctionClass
An
AcquisitionFunctionClass
is an acquisition function represented using a class rather than as a standalone function. Using a class to represent an acquisition function makes it easier to update it, to avoid having to retrace the function on every call.The Bayesian active learning by disagrement acquisition function computes the information gain of the predictive entropy [HHGL11]. the acquisiton function is calculated by:
\[\mathrm{h}\left(\Phi\left(\frac{\mu_{\boldsymbol{x}, \mathcal{D}}} {\sqrt{\sigma_{\boldsymbol{x}, \mathcal{D}}^{2}+1}}\right)\right) -\frac{C \exp \left(-\frac{\mu_{\boldsymbol{x}, \mathcal{D}}^{2}} {2\left(\sigma_{\boldsymbol{w}, \mathcal{D}}^{+C^{2}}\right)}\right)} {\sqrt{\sigma_{\boldsymbol{x}, \mathcal{D}}^{2}+C^{2}}}\]Here \(\mathrm{h}(p)\) is defined as:
\[\mathrm{h}(p)=-p \log p-(1-p) \log (1-p)\]This acquisition function is intended to use for Binary Gaussian Process Classification model with Bernoulli likelihood. It is designed for VGP but other Gaussian approximation of the posterior can be used. SVGP for instance, or some other model that is not currently supported by Trieste. Integrating over nuisance parameters is currently not supported (see equation 6 of the paper).
- Parameters
model – The model of the objective function.
jitter – The size of the jitter to avoid numerical problem caused by the log operation if variance is close to zero.
- Returns
The Bayesian Active Learning By Disagreement acquisition function.
-
__call__
(self, x: trieste.types.TensorType) → trieste.types.TensorType[source]# Call acquisition function.