gpflux.layers#

Layers

Subpackages#

gpflux.layers.basis_functions

Submodules#

Package Contents#

class BayesianDenseLayer(input_dim: int, output_dim: int, num_data: int, w_mu: numpy.ndarray | None = None, w_sqrt: numpy.ndarray | None = None, activation: Callable | None = None, is_mean_field: bool = True, temperature: float = 0.0001)[source]#

Bases: gpflux.layers.trackable_layer.TrackableLayer

A dense (fully-connected) layer for variational Bayesian neural networks.

This layer holds the mean and square-root of the variance of the distribution over the weights. This layer also has a temperature for cooling (or heating) the posterior.

Parameters:

input_dim – The input dimension (excluding bias) of this layer.
output_dim – The output dimension of this layer.
num_data – The number of points in the training dataset (used for scaling the KL regulariser).
w_mu – Initial value of the variational mean for weights + bias. If not specified, this defaults to xavier_initialization_numpy for the weights and zero for the bias.
w_sqrt – Initial value of the variational Cholesky of the (co)variance for weights + bias. If not specified, this defaults to 1e-5 * Identity.
activation – The activation function. If not specified, this defaults to the identity.
is_mean_field – Determines whether the approximation to the weight posterior is mean field. Must be consistent with the shape of w_sqrt, if specified.
temperature – The KL loss will be scaled by this factor. Can be used for cooling (< 1.0) or heating (> 1.0) the posterior. As suggested in “How Good is the Bayes Posterior in Deep Neural Networks Really?” by Wenzel et al. (2020) the default value is a cold 1e-4.

build(input_shape: gpflux.types.ShapeType) → None[source]#: Build the variables necessary on first call

predict_samples(inputs: gpflow.base.TensorType, *, num_samples: int | None = None) → tf.Tensor[source]#

Samples from the approximate posterior at N test inputs, with input_dim = D, output_dim = Q.

Parameters:

inputs – The inputs to predict at; shape [N, D].
num_samples – The number of samples S, to draw.

Returns:

Samples, shape [S, N, Q] if S is not None else [N, Q].

call(inputs: gpflow.base.TensorType, training: bool | None = False) → tf.Tensor | gpflow.models.model.MeanAndVariance[source]#: The default behaviour upon calling this layer.

prior_kl() → tf.Tensor[source]#: Returns the KL divergence KL[q(u)∥p(u)] from the prior p(u) = N(0, I) to the variational distribution q(u) = N(w_mu, w_sqrt²).

class GPLayer(kernel: gpflow.kernels.MultioutputKernel, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables, num_data: int, mean_function: gpflow.mean_functions.MeanFunction | None = None, *, num_samples: int | None = None, full_cov: bool = False, full_output_cov: bool = False, num_latent_gps: int = None, whiten: bool = True, name: str | None = None, verbose: bool = True)[source]#

Bases: tfp.layers.DistributionLambda

A sparse variational multioutput GP layer. This layer holds the kernel, inducing variables and variational distribution, and mean function.

Parameters:

kernel – The multioutput kernel for this layer.
inducing_variable – The inducing features for this layer.
num_data – The number of points in the training dataset (see num_data).
mean_function –
The mean function that will be applied to the inputs. Default: Identity.

Note

The Identity mean function requires the input and output dimensionality of this layer to be the same. If you want to change the dimensionality in a layer, you may want to provide a Linear mean function instead.
num_samples – The number of samples to draw when converting the DistributionLambda into a tf.Tensor, see _convert_to_tensor_fn(). Will be stored in the num_samples attribute. If None (the default), draw a single sample without prefixing the sample shape (see tfp.distributions.Distribution’s sample() method).
full_cov – Sets default behaviour of calling this layer (full_cov attribute): If False (the default), only predict marginals (diagonal of covariance) with respect to inputs. If True, predict full covariance over inputs.
full_output_cov – Sets default behaviour of calling this layer (full_output_cov attribute): If False (the default), only predict marginals (diagonal of covariance) with respect to outputs. If True, predict full covariance over outputs.
num_latent_gps – The number of (latent) GPs in the layer (which can be different from the number of outputs, e.g. with a LinearCoregionalization kernel). This is used to determine the size of the variational parameters q_mu and q_sqrt. If possible, it is inferred from the kernel and inducing_variable.
whiten – If True (the default), uses the whitened parameterisation of the inducing variables; see whiten.
name – The name of this layer.
verbose – The verbosity mode. Set this parameter to True to show debug information.

num_data: int#: The number of points in the training dataset. This information is used to obtain the correct scaling between the data-fit and the KL term in the evidence lower bound (ELBO).

whiten: bool#

This parameter determines the parameterisation of the inducing variables.

If True, this layer uses the whitened (or non-centred) representation, in which (at the example of inducing point inducing variables) u = f(Z) = cholesky(Kuu) v, and we parameterise an approximate posterior on v as q(v) = N(q_mu, q_sqrt q_sqrtᵀ). The prior on v is p(v) = N(0, I).

If False, this layer uses the non-whitened (or centred) representation, in which we directly parameterise q(u) = N(q_mu, q_sqrt q_sqrtᵀ). The prior on u is p(u) = N(0, Kuu).

num_samples: int | None#: The number of samples drawn when coercing the output distribution of this layer to a tf.Tensor. (See _convert_to_tensor_fn().)

full_cov: bool#: This parameter determines the behaviour of calling this layer. If False, only predict or sample marginals (diagonal of covariance) with respect to inputs. If True, predict or sample with the full covariance over the inputs.

full_output_cov: bool#: This parameter determines the behaviour of calling this layer. If False, only predict or sample marginals (diagonal of covariance) with respect to outputs. If True, predict or sample with the full covariance over the outputs.

q_mu: gpflow.Parameter#: The mean of q(v) or q(u) (depending on whether whitened parametrisation is used).

q_sqrt: gpflow.Parameter#: The lower-triangular Cholesky factor of the covariance of q(v) or q(u) (depending on whether whitened parametrisation is used).

predict(inputs: gpflow.base.TensorType, *, full_cov: bool = False, full_output_cov: bool = False) → Tuple[tf.Tensor, tf.Tensor][source]#

Make a prediction at N test inputs for the Q outputs of this layer, including the mean function contribution.

The covariance and its shape is determined by full_cov and full_output_cov as follows:

(co)variance shape	`full_output_cov=False`	`full_output_cov=True`
`full_cov=False`	[N, Q]	[N, Q, Q]
`full_cov=True`	[Q, N, N]	[N, Q, N, Q]

Parameters:

inputs – The inputs to predict at, with a shape of [N, D], where D is the input dimensionality of this layer.
full_cov – Whether to return full covariance (if True) or marginal variance (if False, the default) w.r.t. inputs.
full_output_cov – Whether to return full covariance (if True) or marginal variance (if False, the default) w.r.t. outputs.

Returns:

posterior mean (shape [N, Q]) and (co)variance (shape as above) at test points

call(inputs: gpflow.base.TensorType, *args: List[Any], **kwargs: Dict[str, Any]) → tf.Tensor[source]#

The default behaviour upon calling this layer.

This method calls the tfp.layers.DistributionLambda super-class call method, which constructs a tfp.distributions.Distribution for the predictive distributions at the input points (see _make_distribution_fn()). You can pass this distribution to tf.convert_to_tensor, which will return samples from the distribution (see _convert_to_tensor_fn()).

This method also adds a layer-specific loss function, given by the KL divergence between this layer and the GP prior (scaled to per-datapoint).

prior_kl() → tf.Tensor[source]#: Returns the KL divergence KL[q(u)∥p(u)] from the prior p(u) to the variational distribution q(u). If this layer uses the whitened representation, returns KL[q(v)∥p(v)].

_make_distribution_fn(previous_layer_outputs: gpflow.base.TensorType) → tfp.distributions.Distribution[source]#

Construct the posterior distributions at the output points of the previous layer, depending on full_cov and full_output_cov.

Parameters:: previous_layer_outputs – The output from the previous layer, which should be coercible to a tf.Tensor

_convert_to_tensor_fn(distribution: tfp.distributions.Distribution) → tf.Tensor[source]#: Convert the predictive distributions at the input points (see _make_distribution_fn()) to a tensor of num_samples samples from that distribution. Whether the samples are correlated or marginal (uncorrelated) depends on full_cov and full_output_cov.

sample() → gpflux.sampling.sample.Sample[source]#: Todo

TODO: Document this.

class LatentVariableLayer(prior: tfp.distributions.Distribution, encoder: gpflow.keras.tf_keras.layers.Layer, compositor: gpflow.keras.tf_keras.layers.Layer | None = None, name: str | None = None)[source]#

Bases: LayerWithObservations

A latent variable layer, with amortized mean-field variational inference.

The latent variable is distribution-agnostic, but assumes a variational posterior that is fully factorised and is of the same distribution family as the prior.

This class is used by models as described in [DSHD18, SDHD19].

Parameters:

prior – A distribution that represents the prior over the latent variable.
encoder – A layer which is passed the concatenated observation inputs and targets, and returns the appropriate parameters for the approximate posterior distribution; see encoder.
compositor – A layer that combines layer inputs and latent variable samples into a single tensor; see compositor. If you do not specify a value for this parameter, the default is tf.keras.layers.Concatenate(axis=-1, dtype=default_float()). Note that you should set dtype of the layer to GPflow’s default dtype as in default_float().
name – The name of this layer (passed through to tf.keras.layers.Layer).

prior: tfp.distributions.Distribution#: The prior distribution for the latent variables.

encoder: gpflow.keras.tf_keras.layers.Layer#: An encoder that maps from a concatenation of inputs and targets to the parameters of the approximate posterior distribution of the corresponding latent variables.

compositor: gpflow.keras.tf_keras.layers.Layer#: A layer that takes as input the two-element [layer_inputs, latent_variable_samples] list and combines the elements into a single output tensor.

call(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType | None = None, training: bool | None = None, seed: int | None = None) → tf.Tensor[source]#

Sample the latent variables and compose them with the layer input.

When training, draw a sample of the latent variable from the posterior, whose distribution is parameterised by the encoder mapping from the data. Also add a KL divergence [posterior∥prior] to the losses.

When not training, draw a sample of the latent variable from the prior.

Parameters:

layer_inputs – The output of the previous layer.
observations – The [inputs, targets], with the shapes [batch size, Din] and [batch size, Dout] respectively. This parameter should be passed only when in training mode.
training – The training mode indicator.
seed – A random seed for the sampling operation.

Returns:

Samples of the latent variable composed with the layer inputs through the compositor

_inference_posteriors(observations: gpflux.types.ObservationType, training: bool | None = None) → tfp.distributions.Distribution[source]#

Return the posterior distributions parametrised by the encoder, which gets called with the concatenation of the inputs and targets in the observations argument.

Todo

We might want to change encoders to have a tfp.layers.DistributionLambda final layer that directly returns the appropriately parameterised distributions object.

Parameters:

observations – The [inputs, targets], with the shapes [batch size, Din] and [batch size, Dout] respectively.
training – The training mode indicator (passed through to the encoder’s call).

Returns:

The posterior distributions object.

_inference_latent_samples_and_loss(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType, seed: int | None = None) → Tuple[tf.Tensor, tf.Tensor][source]#

Sample latent variables during the training forward pass, hence requiring the observations. Also return the KL loss per datapoint.

Parameters:

layer_inputs – The output of the previous layer _(unused)_.
observations – The [inputs, targets], with the shapes [batch size, Din] and [batch size, Dout] respectively.
seed – A random seed for the sampling operation.

Returns:

The samples and the loss-per-datapoint.

_prediction_latent_samples(layer_inputs: gpflow.base.TensorType, seed: int | None = None) → tf.Tensor[source]#

Sample latent variables during the prediction forward pass, only depending on the shape of this layer’s inputs.

Parameters:

layer_inputs – The output of the previous layer (for determining batch shape).
seed – A random seed for the sampling operation.

Returns:

The samples.

_local_kls(posteriors: tfp.distributions.Distribution) → tf.Tensor[source]#

Compute the KL divergences [posteriors∥prior].

Parameters:: posteriors – A distribution that represents the approximate posteriors.
Returns:: The KL divergences from the prior for each of the posteriors.

class LayerWithObservations[source]#

Bases: gpflux.layers.trackable_layer.TrackableLayer

By inheriting from this class, Layers indicate that their call() method takes a second observations argument after the customary layer_inputs argument.

This is used to distinguish which layers (unlike most standard Keras layers) require the original inputs and/or targets during training. For example, it is used by the amortized variational inference in the LatentVariableLayer.

abstract call(layer_inputs: gpflow.base.TensorType, observations: gpflux.types.ObservationType | None = None, training: bool | None = None) → tf.Tensor[source]#: The call() method of LayerWithObservations subclasses should accept a second argument, observations. In training mode, this will be the [inputs, targets] of the training points; otherwise, it is None.

class LikelihoodLayer(likelihood: gpflow.likelihoods.Likelihood)[source]#

Bases: gpflux.layers.trackable_layer.TrackableLayer

A Keras layer that wraps a GPflow Likelihood. This layer expects a tfp.distributions.MultivariateNormalDiag as its input, describing q(f). When training, calling this class computes the negative variational expectation $- E_{q (f)} [\log p (y | f)]$ and adds it as a layer loss. When not training, it computes the mean and variance of y under q(f) using predict_mean_and_var().

Note

Use either this LikelihoodLayer (together with gpflux.models.DeepGP) or LikelihoodLoss (e.g. together with a tf.keras.Sequential model). Do not use both at once because this would add the loss twice.

call(inputs: tfp.distributions.MultivariateNormalDiag, targets: gpflow.base.TensorType | None = None, training: bool = None) → LikelihoodOutputs[source]#

When training (training=True), this method computes variational expectations (data-fit loss) and adds this information as a layer loss. When testing (the default), it computes the posterior mean and variance of y.

Parameters:: inputs – The output distribution of the previous layer. This is currently expected to be a MultivariateNormalDiag; that is, the preceding GPLayer should have full_cov=full_output_cov=False.
Returns:: a LikelihoodOutputs tuple with the mean and variance of f and, if not training, the mean and variance of y.

Todo

Turn this layer into a DistributionLambda as well and return the correct Distribution instead of a tuple containing mean and variance only.

class TrackableLayer[source]#

Bases: gpflow.keras.tf_keras.layers.Layer

With the release of TensorFlow 2.5, our TrackableLayer workaround is no longer needed. See Prowler-io/gpflux#189. Will be removed in GPflux version 1.0.0