trieste.models.keras
#
This package contains the primary interface for deep neural network models. It also contains a
number of TrainableProbabilisticModel
wrappers for neural network models. Note that
currently copying/saving models is not supported, so when
BayesianOptimizer
is used track_state
should be set
to False.
Submodules#
Package Contents#
-
class
GaussianNetwork
(input_tensor_spec: tensorflow.TensorSpec, output_tensor_spec: tensorflow.TensorSpec, hidden_layer_args: Sequence[dict[str, Any]] = ({'units': 50, 'activation': 'relu'}, {'units': 50, 'activation': 'relu'}), independent: bool = False)[source]# Bases:
KerasEnsembleNetwork
This class defines layers of a probabilistic neural network using Keras. The network architecture is a multilayer fully-connected feed-forward network, with Gaussian distribution as an output. The layers are meant to be built as an ensemble model by
KerasEnsemble
. Note that this is not a Bayesian neural network.- Parameters
input_tensor_spec – Tensor specification for the input to the network.
output_tensor_spec – Tensor specification for the output of the network.
hidden_layer_args – Specification for building dense hidden layers. Each element in the sequence should be a dictionary containing arguments (keys) and their values for a
Dense
hidden layer. Please check Keras Dense layer API for available arguments. Objects in the sequence will sequentially be used to addDense
layers. Length of this sequence determines the number of hidden layers in the network. Default value is two hidden layers, 50 nodes each, with ReLu activation functions. Empty sequence needs to be passed to have no hidden layers.independent – In case multiple outputs are modeled, if set to True then
IndependentNormal
layer is used as the output layer. This models outputs as independent, only the diagonal elements of the covariance matrix are parametrized. If left as the default False, thenMultivariateNormalTriL
layer is used where correlations between outputs are learned as well.
- Raises
ValueError – If objects in
hidden_layer_args
are not dictionaries.
-
connect_layers
() → tuple[tensorflow.Tensor, tensorflow.Tensor]# Connect all layers in the network. We start by generating an input tensor based on input tensor specification. Next we generate a sequence of hidden dense layers based on hidden layer arguments. Finally, we generate a dense layer whose nodes act as parameters of a Gaussian distribution in the final probabilistic layer.
- Returns
Input and output tensor of the sequence of layers.
-
class
KerasEnsemble
(networks: Sequence[KerasEnsembleNetwork])[source]# This class builds an ensemble of neural networks, using Keras. Individual networks must be instance of
KerasEnsembleNetwork
. This class is meant to be used withDeepEnsemble
model wrapper, which compiles the model.- Parameters
networks – A list of neural network specifications, one for each member of the ensemble. The ensemble will be built using these specifications.
- Raises
ValueError – If there are no objects in
networks
or we try to create a model with networks whose input or output shapes are not the same.
-
property
model
→ tensorflow.keras.Model# Returns built but uncompiled Keras ensemble model.
-
property
ensemble_size
→ int# Returns the size of the ensemble, that is, the number of base learners or individual neural network models in the ensemble.
-
_build_ensemble
() → tensorflow.keras.Model# Builds the ensemble model by combining all the individual networks in a single Keras model. This method relies on
connect_layers
method ofKerasEnsembleNetwork
objects to construct individual networks.- Returns
The Keras model.
-
class
KerasEnsembleNetwork
(input_tensor_spec: tensorflow.TensorSpec, output_tensor_spec: tensorflow.TensorSpec, network_name: str = '')[source]# This class is an interface that defines necessary attributes and methods for neural networks that are meant to be used for building ensembles by
KerasEnsemble
. Subclasses are not meant to build and compile Keras models, instead they are providing specification thatKerasEnsemble
will use to build the Keras model.- Parameters
input_tensor_spec – Tensor specification for the input to the network.
output_tensor_spec – Tensor specification for the output of the network.
network_name – The name to be used when building the network.
-
build_keras_ensemble
(data: trieste.data.Dataset, ensemble_size: int = 5, num_hidden_layers: int = 2, units: int = 25, activation: Union[str, tensorflow.keras.layers.Activation] = 'relu', independent_normal: bool = False) → trieste.models.keras.architectures.KerasEnsemble[source]# Builds a simple ensemble of neural networks in Keras where each network has the same architecture: number of hidden layers, nodes in hidden layers and activation function.
Default ensemble size and activation function seem to work well in practice, in regression type of problems at least. Number of hidden layers and units per layer should be modified according to the dataset size and complexity of the function - the default values seem to work well for small datasets common in Bayesian optimization. Using the independent normal is relevant only if one is modelling multiple output variables, as it simplifies the distribution by ignoring correlations between outputs.
- Parameters
data – Data for training, used for extracting input and output tensor specifications.
ensemble_size – The size of the ensemble, that is, the number of base learners or individual neural networks in the ensemble.
num_hidden_layers – The number of hidden layers in each network.
units – The number of nodes in each hidden layer.
activation – The activation function in each hidden layer.
independent_normal – If set to True then
IndependentNormal
layer is used as the output layer. This models outputs as independent, only the diagonal elements of the covariance matrix are parametrized. If left as the default False, thenMultivariateNormalTriL
layer is used where correlations between outputs are learned as well. Note that this is only relevant for multi-output models.
- Returns
Keras ensemble model.
-
class
DeepEnsembleModel
[source]# Bases:
trieste.models.interfaces.ProbabilisticModel
,typing_extensions.Protocol
This is an interface for deep ensemble type of model, primarily for usage by trajectory samplers, to avoid circular imports. These models can act as probabilistic models by deriving estimates of epistemic uncertainty from the diversity of predictions made by individual models in the ensemble.
-
property
ensemble_size
→ int# Returns the size of the ensemble, that is, the number of base learners or individual models in the ensemble.
-
abstract
ensemble_distributions
(query_points: trieste.types.TensorType) → tuple[tensorflow_probability.distributions.Distribution, Ellipsis]# Return distributions for each member of the ensemble. Type of the output will depend on the subclass, it might be a predicted value or a distribution.
- Parameters
query_points – The points at which to return outputs.
- Returns
The outputs for the observations at the specified
query_points
for each member of the ensemble.
-
property
-
class
KerasPredictor
(optimizer: Optional[trieste.models.optimizer.KerasOptimizer] = None)[source]# Bases:
trieste.models.interfaces.ProbabilisticModel
,abc.ABC
This is an interface for trainable wrappers of TensorFlow and Keras neural network models.
- Parameters
optimizer – The optimizer wrapper containing the optimizer with which to train the model and arguments for the wrapper and the optimizer. The optimizer must be an instance of a
Optimizer
. Defaults toAdam
optimizer with default parameters.- Raises
ValueError – If the optimizer is not an instance of
Optimizer
.
-
property
model
→ tensorflow.keras.Model# The compiled Keras model.
-
property
optimizer
→ trieste.models.optimizer.KerasOptimizer# The optimizer wrapper for training the model.
-
predict
(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Return the mean and variance of the independent marginal distributions at each point in
query_points
.This is essentially a convenience method for
predict_joint()
, where non-event dimensions ofquery_points
are all interpreted as broadcasting dimensions instead of batch dimensions, and the covariance is squeezed to remove redundant nesting.- Parameters
query_points – The points at which to make predictions, of shape […, D].
- Returns
The mean and variance of the independent marginal distributions at each point in
query_points
. For a predictive distribution with event shape E, the mean and variance will both have shape […] + E.
-
abstract
sample
(query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType# Return
num_samples
samples from the independent marginal distributions atquery_points
.- Parameters
query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.
- Returns
The samples. For a predictive distribution with event shape E, this has shape […, S, N] + E, where S is the number of samples.
-
class
DeepEnsemble
(model: trieste.models.keras.architectures.KerasEnsemble, optimizer: Optional[trieste.models.optimizer.KerasOptimizer] = None, bootstrap: bool = False, diversify: bool = False, continuous_optimisation: bool = True)[source]# Bases:
trieste.models.keras.interface.KerasPredictor
,trieste.models.interfaces.TrainableProbabilisticModel
,trieste.models.keras.interface.DeepEnsembleModel
,trieste.models.interfaces.HasTrajectorySampler
A
TrainableProbabilisticModel
wrapper for deep ensembles built using Keras.Deep ensembles are ensembles of deep neural networks that have been found to have good representation of uncertainty in practice (<cite data-cite=”lakshminarayanan2017simple”/>). This makes them a potentially attractive model for Bayesian optimization for use-cases with large number of observations, non-stationary objective functions and need for fast predictions, in which standard Gaussian process models are likely to struggle. The model consists of simple fully connected multilayer probabilistic networks as base learners, with Gaussian distribution as a final layer, using the negative log-likelihood loss for training the networks. The model relies on differences in random initialization of weights for generating diversity among base learners.
The original formulation of the model does not include boostrapping of the data. The authors found that it does not improve performance the model. We include bootstrapping as an option as later work that more precisely measured uncertainty quantification found that boostrapping does help with uncertainty representation (see <cite data-cite=”osband2021epistemic”/>).
We provide classes for constructing ensembles using Keras (
KerasEnsemble
) in the architectures package that should be used with theDeepEnsemble
wrapper. There we also provide aGaussianNetwork
base learner following the original formulation in <cite data-cite=”lakshminarayanan2017simple”/>, but any user-specified network can be supplied, as long as it has a Gaussian distribution as a final layer and follows theKerasEnsembleNetwork
interface.A word of caution in case a learning rate scheduler is used in
fit_args
toKerasOptimizer
optimizer instance. Typically one would not want to continue with the reduced learning rate in the subsequent Bayesian optimization step. Hence, we reset the learning rate to the original one after calling thefit
method. In case this is not the behaviour you would like, you will need to subclass the model and overwrite theoptimize()
method.Currently we do not support setting up the model with dictionary config.
- Parameters
model – A Keras ensemble model with probabilistic networks as ensemble members. The model has to be built but not compiled.
optimizer – The optimizer wrapper with necessary specifications for compiling and training the model. Defaults to
KerasOptimizer
withAdam
optimizer, negative log likelihood loss, mean squared error metric and a dictionary of default arguments for Keras fit method: 3000 epochs, batch size 16, early stopping callback with patience of 50, and verbose 0. See https://keras.io/api/models/model_training_apis/#fit-method for a list of possible arguments.bootstrap – Sample with replacement data for training each network in the ensemble. By default set to False.
diversify – Whether to use quantiles from the approximate Gaussian distribution of the ensemble as trajectories instead of mean predictions when calling
trajectory_sampler()
. This mode can be used to increase the diversity in case of optimizing very large batches of trajectories. By default set to False.continuous_optimisation – If True (default), the optimizer will keep track of the number of epochs across BO iterations and use this number as initial_epoch. This is essential to allow monitoring of model training across BO iterations.
- Raises
ValueError – If
model
is not an instance ofKerasEnsemble
or ensemble has less than two base learners (networks).
-
property
model
→ tensorflow.keras.Model# ” Returns compiled Keras ensemble model.
-
property
ensemble_size
→ int# Returns the size of the ensemble, that is, the number of base learners or individual neural network models in the ensemble.
-
prepare_dataset
(dataset: trieste.data.Dataset) → tuple[Dict[str, trieste.types.TensorType], Dict[str, trieste.types.TensorType]]# Transform
dataset
into inputs and outputs with correct names that can be used for training theKerasEnsemble
model.If
bootstrap
argument in theDeepEnsemble
is set to True, data will be additionally sampled with replacement, independently for each network in the ensemble.- Parameters
dataset – A dataset with
query_points
andobservations
tensors.- Returns
A dictionary with input data and a dictionary with output data.
-
prepare_query_points
(query_points: trieste.types.TensorType) → Dict[str, trieste.types.TensorType]# Transform
query_points
into inputs with correct names that can be used for predicting with the model.- Parameters
query_points – A tensor with
query_points
.- Returns
A dictionary with query_points prepared for predictions.
-
ensemble_distributions
(query_points: trieste.types.TensorType) → tuple[tensorflow_probability.python.distributions.Distribution, Ellipsis]# Return distributions for each member of the ensemble.
- Parameters
query_points – The points at which to return distributions.
- Returns
The distributions for the observations at the specified
query_points
for each member of the ensemble.
-
predict
(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Returns mean and variance at
query_points
for the whole ensemble.Following <cite data-cite=”lakshminarayanan2017simple”/> we treat the ensemble as a uniformly-weighted Gaussian mixture model and combine the predictions as
\[p(y|\mathbf{x}) = M^{-1} \Sum_{m=1}^M \mathcal{N} (\mu_{\theta_m}(\mathbf{x}),\,\sigma_{\theta_m}^{2}(\mathbf{x}))\]We further approximate the ensemble prediction as a Gaussian whose mean and variance are respectively the mean and variance of the mixture, given by
\[\mu_{*}(\mathbf{x}) = M^{-1} \Sum_{m=1}^M \mu_{\theta_m}(\mathbf{x})\]\[\sigma^2_{*}(\mathbf{x}) = M^{-1} \Sum_{m=1}^M (\sigma_{\theta_m}^{2}(\mathbf{x}) + \mu^2_{\theta_m}(\mathbf{x})) - \mu^2_{*}(\mathbf{x})\]This method assumes that the final layer in each member of the ensemble is probabilistic, an instance of
Distribution
. In particular, given the nature of the approximations stated above the final layer should be a Gaussian distribution with mean and variance methods.- Parameters
query_points – The points at which to make predictions.
- Returns
The predicted mean and variance of the observations at the specified
query_points
.
-
predict_ensemble
(query_points: trieste.types.TensorType) → tuple[trieste.types.TensorType, trieste.types.TensorType]# Returns mean and variance at
query_points
for each member of the ensemble. First tensor is the mean and second is the variance, where each has shape […, M, N, 1], where M is theensemble_size
.This method assumes that the final layer in each member of the ensemble is probabilistic, an instance of
¬tfp.distributions.Distribution
, in particular mean and variance methods should be available.- Parameters
query_points – The points at which to make predictions.
- Returns
The predicted mean and variance of the observations at the specified
query_points
for each member of the ensemble.
-
sample
(query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType# Return
num_samples
samples atquery_points
. We use the mixture approximation inpredict()
forquery_points
and samplenum_samples
times from a Gaussian distribution given by the predicted mean and variance.- Parameters
query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.
- Returns
The samples. For a predictive distribution with event shape E, this has shape […, S, N] + E, where S is the number of samples.
-
sample_ensemble
(query_points: trieste.types.TensorType, num_samples: int) → trieste.types.TensorType# Return
num_samples
samples atquery_points
. Each sample is taken from a Gaussian distribution given by the predicted mean and variance of a randomly chosen network in the ensemble. This avoids using the Gaussian mixture approximation and samples directly from individual Gaussian distributions given by each network in the ensemble.- Parameters
query_points – The points at which to sample, with shape […, N, D].
num_samples – The number of samples at each point.
- Returns
The samples. For a predictive distribution with event shape E, this has shape […, S, N] + E, where S is the number of samples.
-
trajectory_sampler
() → trieste.models.interfaces.TrajectorySampler[DeepEnsemble]# Return a trajectory sampler. For
DeepEnsemble
, we use an ensemble sampler that randomly picks a network from the ensemble and uses its predicted means for generating a trajectory, or optionally randomly sampled quantiles rather than means.- Returns
The trajectory sampler.
-
update
(dataset: trieste.data.Dataset) → None# Neural networks are parametric models and do not need to update data. TrainableProbabilisticModel interface, however, requires an update method, so here we simply pass the execution.
-
optimize
(dataset: trieste.data.Dataset) → None# Optimize the underlying Keras ensemble model with the specified
dataset
.Optimization is performed by using the Keras fit method, rather than applying the optimizer and using the batches supplied with the optimizer wrapper. User can pass arguments to the fit method through
minimize_args
argument in the optimizer wrapper. These default to using 100 epochs, batch size 100, and verbose 0. See https://keras.io/api/models/model_training_apis/#fit-method for a list of possible arguments.Note that optimization does not return the result, instead optimization results are stored in a history attribute of the model object.
- Parameters
dataset – The data with which to optimize the model.
-
log
(dataset: Optional[trieste.data.Dataset] = None) → None# Log model training information at a given optimization step to the Tensorboard. We log several summary statistics of losses and metrics given in
fit_args
tooptimizer
(final, difference between inital and final loss, min and max). We also log epoch statistics, but as histograms, rather than time series. We also log several training data based metrics, such as root mean square error between predictions and observations, and several others.We do not log statistics of individual models in the ensemble unless specifically switched on with
trieste.logging.set_summary_filter(lambda name: True)
.For custom logs user will need to subclass the model and overwrite this method.
- Parameters
dataset – Optional data that can be used to log additional data-based model summaries.
-
class
DeepEnsembleTrajectorySampler
(model: trieste.models.keras.interface.DeepEnsembleModel, diversify: bool = False, seed: Optional[int] = None)[source]# Bases:
trieste.models.interfaces.TrajectorySampler
[trieste.models.keras.interface.DeepEnsembleModel
]This class builds functions that approximate a trajectory by randomly choosing a network from the ensemble and using its predicted means as a trajectory.
Option diversify can be used to increase the diversity in case of optimizing very large batches of trajectories. We use quantiles from the approximate Gaussian distribution of the ensemble as trajectories, with randomly chosen quantiles approximating a trajectory and using a reparametrisation trick to speed up computation. Note that quantiles are not true trajectories, so this will likely have some performance costs.
- Parameters
model – The ensemble model to sample from.
diversify – Whether to use quantiles from the approximate Gaussian distribution of the ensemble as trajectories (False by default). See class docstring for details.
seed – Random number seed to use for trajectory sampling.
- Raises
NotImplementedError – If we try to use the model that is not instance of
DeepEnsembleModel
.
-
get_trajectory
() → trieste.models.interfaces.TrajectoryFunction# Generate an approximate function draw (trajectory) from the ensemble.
- Returns
A trajectory function representing an approximate trajectory from the model, taking an input of shape [N, B, D] and returning shape [N, B, L].
-
update_trajectory
(trajectory: trieste.models.interfaces.TrajectoryFunction) → trieste.models.interfaces.TrajectoryFunction# Update a
TrajectoryFunction
to reflect an update in its underlyingDeepEnsembleModel
and resample accordingly.Here we rely on the underlying models being updated and we only resample the trajectory.
- Parameters
trajectory – The trajectory function to be resampled.
- Returns
The new trajectory function updated for a new model
-
resample_trajectory
(trajectory: trieste.models.interfaces.TrajectoryFunction) → trieste.models.interfaces.TrajectoryFunction# Efficiently resample a
TrajectoryFunction
in-place to avoid function retracing with every new sample.- Parameters
trajectory – The trajectory function to be resampled.
- Returns
The new resampled trajectory function.
-
class
deep_ensemble_trajectory
(model: trieste.models.keras.interface.DeepEnsembleModel, diversify: bool, seed: Optional[int] = None)[source]# Bases:
trieste.models.interfaces.TrajectoryFunctionClass
Generate an approximate function draw (trajectory) by randomly choosing a batch B of networks from the ensemble and using their predicted means as trajectories.
Option diversify can be used to increase the diversity in case of optimizing very large batches of trajectories. We use quantiles from the approximate Gaussian distribution of the ensemble as trajectories, with randomly chosen quantiles approximating a trajectory and using a reparametrisation trick to speed up computation. Note that quantiles are not true trajectories, so this will likely have some performance costs.
- Parameters
model – The model of the objective function.
diversify – Whether to use samples from final probabilistic layer as trajectories or mean predictions.
seed – Optional RNG seed.
-
__call__
(x: trieste.types.TensorType) → trieste.types.TensorType# Call trajectory function. Note that we are flattening the batch dimension and doing a forward pass with each network in the ensemble with the whole batch. This is somewhat wasteful, but is necessary given the underlying
KerasEnsemble
network model.
-
get_state
() → Dict[str, trieste.types.TensorType]# Return internal state variables.
-
get_tensor_spec_from_data
(dataset: trieste.data.Dataset) → tuple[tensorflow.TensorSpec, tensorflow.TensorSpec][source]# Extract tensor specifications for inputs and outputs of neural network models, based on the dataset. This utility faciliates constructing neural networks, providing the required dimensions for the input and the output of the network. For example
>>> data = Dataset( ... tf.constant([[0.1, 0.2], [0.3, 0.4]]), ... tf.constant([[0.5], [0.7]]) ... ) >>> input_spec, output_spec = get_tensor_spec_from_data(data) >>> input_spec TensorSpec(shape=(2,), dtype=tf.float32, name='query_points') >>> output_spec TensorSpec(shape=(1,), dtype=tf.float32, name='observations')
- Parameters
dataset – A dataset with
query_points
andobservations
tensors.- Returns
Tensor specification objects for the
query_points
andobservations
tensors.- Raises
ValueError – If the dataset is not an instance of
Dataset
.
-
negative_log_likelihood
(y_true: trieste.types.TensorType, y_pred: tensorflow_probability.distributions.Distribution) → trieste.types.TensorType[source]# Maximum likelihood objective function for training neural networks.
- Parameters
y_true – The output variable values.
y_pred – The output layer of the model. It has to be a probabilistic neural network with a distribution as a final layer.
- Returns
Negative log likelihood values.
-
sample_with_replacement
(dataset: trieste.data.Dataset) → trieste.data.Dataset[source]# Create a new
dataset
with data sampled with replacement. This function is useful for creating bootstrap samples of data for training ensembles.- Parameters
dataset – The data that should be sampled.
- Returns
A (new)
dataset
with sampled data.- Raises
ValueError (or InvalidArgumentError) – If the dataset is not an instance of
Dataset
or it is empty.