gpflux.layers#
Layers
Subpackages#
Submodules#
Package Contents#
- class BayesianDenseLayer(input_dim: int, output_dim: int, num_data: int, w_mu: numpy.ndarray | None = None, w_sqrt: numpy.ndarray | None = None, activation: Callable | None = None, is_mean_field: bool = True, temperature: float = 0.0001)[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayer
A dense (fully-connected) layer for variational Bayesian neural networks.
This layer holds the mean and square-root of the variance of the distribution over the weights. This layer also has a temperature for cooling (or heating) the posterior.
- Parameters:
input_dim – The input dimension (excluding bias) of this layer.
output_dim – The output dimension of this layer.
num_data – The number of points in the training dataset (used for scaling the KL regulariser).
w_mu – Initial value of the variational mean for weights + bias. If not specified, this defaults to
xavier_initialization_numpy
for the weights and zero for the bias.w_sqrt – Initial value of the variational Cholesky of the (co)variance for weights + bias. If not specified, this defaults to 1e-5 * Identity.
activation – The activation function. If not specified, this defaults to the identity.
is_mean_field – Determines whether the approximation to the weight posterior is mean field. Must be consistent with the shape of
w_sqrt
, if specified.temperature – The KL loss will be scaled by this factor. Can be used for cooling (< 1.0) or heating (> 1.0) the posterior. As suggested in “How Good is the Bayes Posterior in Deep Neural Networks Really?” by Wenzel et al. (2020) the default value is a cold
1e-4
.
- build(input_shape: gpflux.types.ShapeType) None [source]#
Build the variables necessary on first call
- predict_samples(inputs: gpflow.base.TensorType, *, num_samples: int | None = None) tf.Tensor [source]#
Samples from the approximate posterior at N test inputs, with input_dim = D, output_dim = Q.
- Parameters:
inputs – The inputs to predict at; shape
[N, D]
.num_samples – The number of samples S, to draw.
- Returns:
Samples, shape
[S, N, Q]
if S is not None else[N, Q]
.
- class GPLayer(kernel: gpflow.kernels.MultioutputKernel, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables, num_data: int, mean_function: gpflow.mean_functions.MeanFunction | None = None, *, num_samples: int | None = None, full_cov: bool = False, full_output_cov: bool = False, num_latent_gps: int = None, whiten: bool = True, name: str | None = None, verbose: bool = True)[source]#
Bases:
tfp.layers.DistributionLambda
A sparse variational multioutput GP layer. This layer holds the kernel, inducing variables and variational distribution, and mean function.
- Parameters:
kernel – The multioutput kernel for this layer.
inducing_variable – The inducing features for this layer.
num_data – The number of points in the training dataset (see
num_data
).mean_function –
The mean function that will be applied to the inputs. Default:
Identity
.Note
The Identity mean function requires the input and output dimensionality of this layer to be the same. If you want to change the dimensionality in a layer, you may want to provide a
Linear
mean function instead.num_samples – The number of samples to draw when converting the
DistributionLambda
into atf.Tensor
, see_convert_to_tensor_fn()
. Will be stored in thenum_samples
attribute. IfNone
(the default), draw a single sample without prefixing the sample shape (seetfp.distributions.Distribution
’s sample() method).full_cov – Sets default behaviour of calling this layer (
full_cov
attribute): IfFalse
(the default), only predict marginals (diagonal of covariance) with respect to inputs. IfTrue
, predict full covariance over inputs.full_output_cov – Sets default behaviour of calling this layer (
full_output_cov
attribute): IfFalse
(the default), only predict marginals (diagonal of covariance) with respect to outputs. IfTrue
, predict full covariance over outputs.num_latent_gps – The number of (latent) GPs in the layer (which can be different from the number of outputs, e.g. with a
LinearCoregionalization
kernel). This is used to determine the size of the variational parametersq_mu
andq_sqrt
. If possible, it is inferred from the kernel and inducing_variable.whiten – If
True
(the default), uses the whitened parameterisation of the inducing variables; seewhiten
.name – The name of this layer.
verbose – The verbosity mode. Set this parameter to
True
to show debug information.
- num_data: int#
The number of points in the training dataset. This information is used to obtain the correct scaling between the data-fit and the KL term in the evidence lower bound (ELBO).
- whiten: bool#
This parameter determines the parameterisation of the inducing variables.
If
True
, this layer uses the whitened (or non-centred) representation, in which (at the example of inducing point inducing variables)u = f(Z) = cholesky(Kuu) v
, and we parameterise an approximate posterior onv
asq(v) = N(q_mu, q_sqrt q_sqrtᵀ)
. The prior onv
isp(v) = N(0, I)
.If
False
, this layer uses the non-whitened (or centred) representation, in which we directly parameteriseq(u) = N(q_mu, q_sqrt q_sqrtᵀ)
. The prior onu
isp(u) = N(0, Kuu)
.
- num_samples: int | None#
The number of samples drawn when coercing the output distribution of this layer to a
tf.Tensor
. (See_convert_to_tensor_fn()
.)
- full_cov: bool#
This parameter determines the behaviour of calling this layer. If
False
, only predict or sample marginals (diagonal of covariance) with respect to inputs. IfTrue
, predict or sample with the full covariance over the inputs.
- full_output_cov: bool#
This parameter determines the behaviour of calling this layer. If
False
, only predict or sample marginals (diagonal of covariance) with respect to outputs. IfTrue
, predict or sample with the full covariance over the outputs.
- q_mu: gpflow.Parameter#
The mean of
q(v)
orq(u)
(depending on whetherwhiten
ed parametrisation is used).
- q_sqrt: gpflow.Parameter#
The lower-triangular Cholesky factor of the covariance of
q(v)
orq(u)
(depending on whetherwhiten
ed parametrisation is used).
- predict(inputs: gpflow.base.TensorType, *, full_cov: bool = False, full_output_cov: bool = False) Tuple[tf.Tensor, tf.Tensor] [source]#
Make a prediction at N test inputs for the Q outputs of this layer, including the mean function contribution.
The covariance and its shape is determined by full_cov and full_output_cov as follows:
(co)variance shape
full_output_cov=False
full_output_cov=True
full_cov=False
[N, Q]
[N, Q, Q]
full_cov=True
[Q, N, N]
[N, Q, N, Q]
- Parameters:
inputs – The inputs to predict at, with a shape of [N, D], where D is the input dimensionality of this layer.
full_cov – Whether to return full covariance (if
True
) or marginal variance (ifFalse
, the default) w.r.t. inputs.full_output_cov – Whether to return full covariance (if
True
) or marginal variance (ifFalse
, the default) w.r.t. outputs.
- Returns:
posterior mean (shape [N, Q]) and (co)variance (shape as above) at test points
- call(inputs: gpflow.base.TensorType, *args: List[Any], **kwargs: Dict[str, Any]) tf.Tensor [source]#
The default behaviour upon calling this layer.
This method calls the
tfp.layers.DistributionLambda
super-classcall
method, which constructs atfp.distributions.Distribution
for the predictive distributions at the input points (see_make_distribution_fn()
). You can pass this distribution totf.convert_to_tensor
, which will return samples from the distribution (see_convert_to_tensor_fn()
).This method also adds a layer-specific loss function, given by the KL divergence between this layer and the GP prior (scaled to per-datapoint).
- prior_kl() tf.Tensor [source]#
Returns the KL divergence
KL[q(u)∥p(u)]
from the priorp(u)
to the variational distributionq(u)
. If this layer uses thewhiten
ed representation, returnsKL[q(v)∥p(v)]
.
- _make_distribution_fn(previous_layer_outputs: gpflow.base.TensorType) tfp.distributions.Distribution [source]#
Construct the posterior distributions at the output points of the previous layer, depending on
full_cov
andfull_output_cov
.- Parameters:
previous_layer_outputs – The output from the previous layer, which should be coercible to a
tf.Tensor
- _convert_to_tensor_fn(distribution: tfp.distributions.Distribution) tf.Tensor [source]#
Convert the predictive distributions at the input points (see
_make_distribution_fn()
) to a tensor ofnum_samples
samples from that distribution. Whether the samples are correlated or marginal (uncorrelated) depends onfull_cov
andfull_output_cov
.
- sample() gpflux.sampling.sample.Sample [source]#
Todo
TODO: Document this.
- class LatentVariableLayer(prior: tfp.distributions.Distribution, encoder: gpflow.keras.tf_keras.layers.Layer, compositor: gpflow.keras.tf_keras.layers.Layer | None = None, name: str | None = None)[source]#
Bases:
LayerWithObservations
A latent variable layer, with amortized mean-field variational inference.
The latent variable is distribution-agnostic, but assumes a variational posterior that is fully factorised and is of the same distribution family as the prior.
This class is used by models as described in [DSHD18, SDHD19].
- Parameters:
prior – A distribution that represents the
prior
over the latent variable.encoder – A layer which is passed the concatenated observation inputs and targets, and returns the appropriate parameters for the approximate posterior distribution; see
encoder
.compositor – A layer that combines layer inputs and latent variable samples into a single tensor; see
compositor
. If you do not specify a value for this parameter, the default istf.keras.layers.Concatenate(axis=-1, dtype=default_float())
. Note that you should setdtype
of the layer to GPflow’s default dtype as indefault_float()
.name – The name of this layer (passed through to
tf.keras.layers.Layer
).
- prior: tfp.distributions.Distribution#
The prior distribution for the latent variables.
- encoder: gpflow.keras.tf_keras.layers.Layer#
An encoder that maps from a concatenation of inputs and targets to the parameters of the approximate posterior distribution of the corresponding latent variables.
- compositor: gpflow.keras.tf_keras.layers.Layer#
A layer that takes as input the two-element
[layer_inputs, latent_variable_samples]
list and combines the elements into a single output tensor.
- call(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType | None = None, training: bool | None = None, seed: int | None = None) tf.Tensor [source]#
Sample the latent variables and compose them with the layer input.
When training, draw a sample of the latent variable from the posterior, whose distribution is parameterised by the encoder mapping from the data. Also add a KL divergence [posterior∥prior] to the losses.
When not training, draw a sample of the latent variable from the prior.
- Parameters:
layer_inputs – The output of the previous layer.
observations – The
[inputs, targets]
, with the shapes[batch size, Din]
and[batch size, Dout]
respectively. This parameter should be passed only when in training mode.training – The training mode indicator.
seed – A random seed for the sampling operation.
- Returns:
Samples of the latent variable composed with the layer inputs through the
compositor
- _inference_posteriors(observations: gpflux.types.ObservationType, training: bool | None = None) tfp.distributions.Distribution [source]#
Return the posterior distributions parametrised by the
encoder
, which gets called with the concatenation of the inputs and targets in the observations argument.Todo
We might want to change encoders to have a
tfp.layers.DistributionLambda
final layer that directly returns the appropriately parameterised distributions object.- Parameters:
observations – The
[inputs, targets]
, with the shapes[batch size, Din]
and[batch size, Dout]
respectively.training – The training mode indicator (passed through to the
encoder
’s call).
- Returns:
The posterior distributions object.
- _inference_latent_samples_and_loss(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType, seed: int | None = None) Tuple[tf.Tensor, tf.Tensor] [source]#
Sample latent variables during the training forward pass, hence requiring the observations. Also return the KL loss per datapoint.
- Parameters:
layer_inputs – The output of the previous layer _(unused)_.
observations – The
[inputs, targets]
, with the shapes[batch size, Din]
and[batch size, Dout]
respectively.seed – A random seed for the sampling operation.
- Returns:
The samples and the loss-per-datapoint.
- _prediction_latent_samples(layer_inputs: gpflow.base.TensorType, seed: int | None = None) tf.Tensor [source]#
Sample latent variables during the prediction forward pass, only depending on the shape of this layer’s inputs.
- Parameters:
layer_inputs – The output of the previous layer (for determining batch shape).
seed – A random seed for the sampling operation.
- Returns:
The samples.
- _local_kls(posteriors: tfp.distributions.Distribution) tf.Tensor [source]#
Compute the KL divergences [posteriors∥prior].
- Parameters:
posteriors – A distribution that represents the approximate posteriors.
- Returns:
The KL divergences from the prior for each of the posteriors.
- class LayerWithObservations[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayer
By inheriting from this class, Layers indicate that their
call()
method takes a second observations argument after the customary layer_inputs argument.This is used to distinguish which layers (unlike most standard Keras layers) require the original inputs and/or targets during training. For example, it is used by the amortized variational inference in the
LatentVariableLayer
.- abstract call(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType | None = None, training: bool | None = None) tf.Tensor [source]#
The
call()
method ofLayerWithObservations
subclasses should accept a second argument, observations. In training mode, this will be the[inputs, targets]
of the training points; otherwise, it isNone
.
- class LikelihoodLayer(likelihood: gpflow.likelihoods.Likelihood)[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayer
A Keras layer that wraps a GPflow
Likelihood
. This layer expects atfp.distributions.MultivariateNormalDiag
as its input, describingq(f)
. When training, calling this class computes the negative variational expectation \(-\mathbb{E}_{q(f)}[\log p(y|f)]\) and adds it as a layer loss. When not training, it computes the mean and variance ofy
underq(f)
usingpredict_mean_and_var()
.Note
Use either this
LikelihoodLayer
(together withgpflux.models.DeepGP
) orLikelihoodLoss
(e.g. together with atf.keras.Sequential
model). Do not use both at once because this would add the loss twice.- call(inputs: tfp.distributions.MultivariateNormalDiag, targets: gpflow.base.TensorType | None = None, training: bool = None) LikelihoodOutputs [source]#
When training (
training=True
), this method computes variational expectations (data-fit loss) and adds this information as a layer loss. When testing (the default), it computes the posterior mean and variance ofy
.- Parameters:
inputs – The output distribution of the previous layer. This is currently expected to be a
MultivariateNormalDiag
; that is, the precedingGPLayer
should havefull_cov=full_output_cov=False
.- Returns:
a
LikelihoodOutputs
tuple with the mean and variance off
and, if not training, the mean and variance ofy
.
Todo
Turn this layer into a
DistributionLambda
as well and return the correctDistribution
instead of a tuple containing mean and variance only.
- class TrackableLayer[source]#
Bases:
gpflow.keras.tf_keras.layers.Layer
With the release of TensorFlow 2.5, our TrackableLayer workaround is no longer needed. See Prowler-io/gpflux#189. Will be removed in GPflux version 1.0.0