gpflux.layers#
Layers
Subpackages#
Submodules#
Package Contents#
- class BayesianDenseLayer(input_dim: int, output_dim: int, num_data: int, w_mu: numpy.ndarray | None = None, w_sqrt: numpy.ndarray | None = None, activation: Callable | None = None, is_mean_field: bool = True, temperature: float = 0.0001)[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayerA dense (fully-connected) layer for variational Bayesian neural networks.
This layer holds the mean and square-root of the variance of the distribution over the weights. This layer also has a temperature for cooling (or heating) the posterior.
- Parameters:
input_dim – The input dimension (excluding bias) of this layer.
output_dim – The output dimension of this layer.
num_data – The number of points in the training dataset (used for scaling the KL regulariser).
w_mu – Initial value of the variational mean for weights + bias. If not specified, this defaults to
xavier_initialization_numpyfor the weights and zero for the bias.w_sqrt – Initial value of the variational Cholesky of the (co)variance for weights + bias. If not specified, this defaults to 1e-5 * Identity.
activation – The activation function. If not specified, this defaults to the identity.
is_mean_field – Determines whether the approximation to the weight posterior is mean field. Must be consistent with the shape of
w_sqrt, if specified.temperature – The KL loss will be scaled by this factor. Can be used for cooling (< 1.0) or heating (> 1.0) the posterior. As suggested in “How Good is the Bayes Posterior in Deep Neural Networks Really?” by Wenzel et al. (2020) the default value is a cold
1e-4.
- build(input_shape: gpflux.types.ShapeType) None[source]#
Build the variables necessary on first call
- predict_samples(inputs: gpflow.base.TensorType, *, num_samples: int | None = None) tf.Tensor[source]#
Samples from the approximate posterior at N test inputs, with input_dim = D, output_dim = Q.
- Parameters:
inputs – The inputs to predict at; shape
[N, D].num_samples – The number of samples S, to draw.
- Returns:
Samples, shape
[S, N, Q]if S is not None else[N, Q].
- class GPLayer(kernel: gpflow.kernels.MultioutputKernel, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables, num_data: int, mean_function: gpflow.mean_functions.MeanFunction | None = None, *, num_samples: int | None = None, full_cov: bool = False, full_output_cov: bool = False, num_latent_gps: int = None, whiten: bool = True, name: str | None = None, verbose: bool = True)[source]#
Bases:
tfp.layers.DistributionLambdaA sparse variational multioutput GP layer. This layer holds the kernel, inducing variables and variational distribution, and mean function.
- Parameters:
kernel – The multioutput kernel for this layer.
inducing_variable – The inducing features for this layer.
num_data – The number of points in the training dataset (see
num_data).mean_function –
The mean function that will be applied to the inputs. Default:
Identity.Note
The Identity mean function requires the input and output dimensionality of this layer to be the same. If you want to change the dimensionality in a layer, you may want to provide a
Linearmean function instead.num_samples – The number of samples to draw when converting the
DistributionLambdainto atf.Tensor, see_convert_to_tensor_fn(). Will be stored in thenum_samplesattribute. IfNone(the default), draw a single sample without prefixing the sample shape (seetfp.distributions.Distribution’s sample() method).full_cov – Sets default behaviour of calling this layer (
full_covattribute): IfFalse(the default), only predict marginals (diagonal of covariance) with respect to inputs. IfTrue, predict full covariance over inputs.full_output_cov – Sets default behaviour of calling this layer (
full_output_covattribute): IfFalse(the default), only predict marginals (diagonal of covariance) with respect to outputs. IfTrue, predict full covariance over outputs.num_latent_gps – The number of (latent) GPs in the layer (which can be different from the number of outputs, e.g. with a
LinearCoregionalizationkernel). This is used to determine the size of the variational parametersq_muandq_sqrt. If possible, it is inferred from the kernel and inducing_variable.whiten – If
True(the default), uses the whitened parameterisation of the inducing variables; seewhiten.name – The name of this layer.
verbose – The verbosity mode. Set this parameter to
Trueto show debug information.
- num_data: int#
The number of points in the training dataset. This information is used to obtain the correct scaling between the data-fit and the KL term in the evidence lower bound (ELBO).
- whiten: bool#
This parameter determines the parameterisation of the inducing variables.
If
True, this layer uses the whitened (or non-centred) representation, in which (at the example of inducing point inducing variables)u = f(Z) = cholesky(Kuu) v, and we parameterise an approximate posterior onvasq(v) = N(q_mu, q_sqrt q_sqrtᵀ). The prior onvisp(v) = N(0, I).If
False, this layer uses the non-whitened (or centred) representation, in which we directly parameteriseq(u) = N(q_mu, q_sqrt q_sqrtᵀ). The prior onuisp(u) = N(0, Kuu).
- num_samples: int | None#
The number of samples drawn when coercing the output distribution of this layer to a
tf.Tensor. (See_convert_to_tensor_fn().)
- full_cov: bool#
This parameter determines the behaviour of calling this layer. If
False, only predict or sample marginals (diagonal of covariance) with respect to inputs. IfTrue, predict or sample with the full covariance over the inputs.
- full_output_cov: bool#
This parameter determines the behaviour of calling this layer. If
False, only predict or sample marginals (diagonal of covariance) with respect to outputs. IfTrue, predict or sample with the full covariance over the outputs.
- q_mu: gpflow.Parameter#
The mean of
q(v)orq(u)(depending on whetherwhitened parametrisation is used).
- q_sqrt: gpflow.Parameter#
The lower-triangular Cholesky factor of the covariance of
q(v)orq(u)(depending on whetherwhitened parametrisation is used).
- predict(inputs: gpflow.base.TensorType, *, full_cov: bool = False, full_output_cov: bool = False) Tuple[tf.Tensor, tf.Tensor][source]#
Make a prediction at N test inputs for the Q outputs of this layer, including the mean function contribution.
The covariance and its shape is determined by full_cov and full_output_cov as follows:
(co)variance shape
full_output_cov=Falsefull_output_cov=Truefull_cov=False[N, Q]
[N, Q, Q]
full_cov=True[Q, N, N]
[N, Q, N, Q]
- Parameters:
inputs – The inputs to predict at, with a shape of [N, D], where D is the input dimensionality of this layer.
full_cov – Whether to return full covariance (if
True) or marginal variance (ifFalse, the default) w.r.t. inputs.full_output_cov – Whether to return full covariance (if
True) or marginal variance (ifFalse, the default) w.r.t. outputs.
- Returns:
posterior mean (shape [N, Q]) and (co)variance (shape as above) at test points
- call(inputs: gpflow.base.TensorType, *args: List[Any], **kwargs: Dict[str, Any]) tf.Tensor[source]#
The default behaviour upon calling this layer.
This method calls the
tfp.layers.DistributionLambdasuper-classcallmethod, which constructs atfp.distributions.Distributionfor the predictive distributions at the input points (see_make_distribution_fn()). You can pass this distribution totf.convert_to_tensor, which will return samples from the distribution (see_convert_to_tensor_fn()).This method also adds a layer-specific loss function, given by the KL divergence between this layer and the GP prior (scaled to per-datapoint).
- prior_kl() tf.Tensor[source]#
Returns the KL divergence
KL[q(u)∥p(u)]from the priorp(u)to the variational distributionq(u). If this layer uses thewhitened representation, returnsKL[q(v)∥p(v)].
- _make_distribution_fn(previous_layer_outputs: gpflow.base.TensorType) tfp.distributions.Distribution[source]#
Construct the posterior distributions at the output points of the previous layer, depending on
full_covandfull_output_cov.- Parameters:
previous_layer_outputs – The output from the previous layer, which should be coercible to a
tf.Tensor
- _convert_to_tensor_fn(distribution: tfp.distributions.Distribution) tf.Tensor[source]#
Convert the predictive distributions at the input points (see
_make_distribution_fn()) to a tensor ofnum_samplessamples from that distribution. Whether the samples are correlated or marginal (uncorrelated) depends onfull_covandfull_output_cov.
- sample() gpflux.sampling.sample.Sample[source]#
Todo
TODO: Document this.
- class LatentVariableLayer(prior: tfp.distributions.Distribution, encoder: gpflow.keras.tf_keras.layers.Layer, compositor: gpflow.keras.tf_keras.layers.Layer | None = None, name: str | None = None)[source]#
Bases:
LayerWithObservationsA latent variable layer, with amortized mean-field variational inference.
The latent variable is distribution-agnostic, but assumes a variational posterior that is fully factorised and is of the same distribution family as the prior.
This class is used by models as described in [DSHD18, SDHD19].
- Parameters:
prior – A distribution that represents the
priorover the latent variable.encoder – A layer which is passed the concatenated observation inputs and targets, and returns the appropriate parameters for the approximate posterior distribution; see
encoder.compositor – A layer that combines layer inputs and latent variable samples into a single tensor; see
compositor. If you do not specify a value for this parameter, the default istf.keras.layers.Concatenate(axis=-1, dtype=default_float()). Note that you should setdtypeof the layer to GPflow’s default dtype as indefault_float().name – The name of this layer (passed through to
tf.keras.layers.Layer).
- prior: tfp.distributions.Distribution#
The prior distribution for the latent variables.
- encoder: gpflow.keras.tf_keras.layers.Layer#
An encoder that maps from a concatenation of inputs and targets to the parameters of the approximate posterior distribution of the corresponding latent variables.
- compositor: gpflow.keras.tf_keras.layers.Layer#
A layer that takes as input the two-element
[layer_inputs, latent_variable_samples]list and combines the elements into a single output tensor.
- call(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType | None = None, training: bool | None = None, seed: int | None = None) tf.Tensor[source]#
Sample the latent variables and compose them with the layer input.
When training, draw a sample of the latent variable from the posterior, whose distribution is parameterised by the encoder mapping from the data. Also add a KL divergence [posterior∥prior] to the losses.
When not training, draw a sample of the latent variable from the prior.
- Parameters:
layer_inputs – The output of the previous layer.
observations – The
[inputs, targets], with the shapes[batch size, Din]and[batch size, Dout]respectively. This parameter should be passed only when in training mode.training – The training mode indicator.
seed – A random seed for the sampling operation.
- Returns:
Samples of the latent variable composed with the layer inputs through the
compositor
- _inference_posteriors(observations: gpflux.types.ObservationType, training: bool | None = None) tfp.distributions.Distribution[source]#
Return the posterior distributions parametrised by the
encoder, which gets called with the concatenation of the inputs and targets in the observations argument.Todo
We might want to change encoders to have a
tfp.layers.DistributionLambdafinal layer that directly returns the appropriately parameterised distributions object.- Parameters:
observations – The
[inputs, targets], with the shapes[batch size, Din]and[batch size, Dout]respectively.training – The training mode indicator (passed through to the
encoder’s call).
- Returns:
The posterior distributions object.
- _inference_latent_samples_and_loss(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType, seed: int | None = None) Tuple[tf.Tensor, tf.Tensor][source]#
Sample latent variables during the training forward pass, hence requiring the observations. Also return the KL loss per datapoint.
- Parameters:
layer_inputs – The output of the previous layer _(unused)_.
observations – The
[inputs, targets], with the shapes[batch size, Din]and[batch size, Dout]respectively.seed – A random seed for the sampling operation.
- Returns:
The samples and the loss-per-datapoint.
- _prediction_latent_samples(layer_inputs: gpflow.base.TensorType, seed: int | None = None) tf.Tensor[source]#
Sample latent variables during the prediction forward pass, only depending on the shape of this layer’s inputs.
- Parameters:
layer_inputs – The output of the previous layer (for determining batch shape).
seed – A random seed for the sampling operation.
- Returns:
The samples.
- _local_kls(posteriors: tfp.distributions.Distribution) tf.Tensor[source]#
Compute the KL divergences [posteriors∥prior].
- Parameters:
posteriors – A distribution that represents the approximate posteriors.
- Returns:
The KL divergences from the prior for each of the posteriors.
- class LayerWithObservations[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayerBy inheriting from this class, Layers indicate that their
call()method takes a second observations argument after the customary layer_inputs argument.This is used to distinguish which layers (unlike most standard Keras layers) require the original inputs and/or targets during training. For example, it is used by the amortized variational inference in the
LatentVariableLayer.- abstract call(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType | None = None, training: bool | None = None) tf.Tensor[source]#
The
call()method ofLayerWithObservationssubclasses should accept a second argument, observations. In training mode, this will be the[inputs, targets]of the training points; otherwise, it isNone.
- class LikelihoodLayer(likelihood: gpflow.likelihoods.Likelihood)[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayerA Keras layer that wraps a GPflow
Likelihood. This layer expects atfp.distributions.MultivariateNormalDiagas its input, describingq(f). When training, calling this class computes the negative variational expectation \(-\mathbb{E}_{q(f)}[\log p(y|f)]\) and adds it as a layer loss. When not training, it computes the mean and variance ofyunderq(f)usingpredict_mean_and_var().Note
Use either this
LikelihoodLayer(together withgpflux.models.DeepGP) orLikelihoodLoss(e.g. together with atf.keras.Sequentialmodel). Do not use both at once because this would add the loss twice.- call(inputs: tfp.distributions.MultivariateNormalDiag, targets: gpflow.base.TensorType | None = None, training: bool = None) LikelihoodOutputs[source]#
When training (
training=True), this method computes variational expectations (data-fit loss) and adds this information as a layer loss. When testing (the default), it computes the posterior mean and variance ofy.- Parameters:
inputs – The output distribution of the previous layer. This is currently expected to be a
MultivariateNormalDiag; that is, the precedingGPLayershould havefull_cov=full_output_cov=False.- Returns:
a
LikelihoodOutputstuple with the mean and variance offand, if not training, the mean and variance ofy.
Todo
Turn this layer into a
DistributionLambdaas well and return the correctDistributioninstead of a tuple containing mean and variance only.
- class TrackableLayer[source]#
Bases:
gpflow.keras.tf_keras.layers.LayerWith the release of TensorFlow 2.5, our TrackableLayer workaround is no longer needed. See Prowler-io/gpflux#189. Will be removed in GPflux version 1.0.0