gpflux.layers.bayesian_dense_layer#

This module provides BayesianDenseLayer, which implements a variational Bayesian dense (fully-connected) neural network layer as a Keras Layer.

Module Contents#

xavier_initialization_numpy(input_dim: int, output_dim: int) → numpy.ndarray[source]#

Generate initial weights for a neural network layer with the given input and output dimensionality using the Xavier Glorot normal initialiser. From:

Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010.

Draw samples from a normal distribution centred on $0$ with standard deviation $\sqrt{(} 2 / (input_dim + output_dim))$ .

class TrackableLayer[source]#

Bases: gpflow.keras.tf_keras.layers.Layer

With the release of TensorFlow 2.5, our TrackableLayer workaround is no longer needed. See Prowler-io/gpflux#189. Will be removed in GPflux version 1.0.0

ShapeType[source]#: Union of valid types for describing the shape of a tf.Tensor(-like) object

class BayesianDenseLayer(input_dim: int, output_dim: int, num_data: int, w_mu: numpy.ndarray | None = None, w_sqrt: numpy.ndarray | None = None, activation: Callable | None = None, is_mean_field: bool = True, temperature: float = 0.0001)[source]#

Bases: gpflux.layers.trackable_layer.TrackableLayer

A dense (fully-connected) layer for variational Bayesian neural networks.

This layer holds the mean and square-root of the variance of the distribution over the weights. This layer also has a temperature for cooling (or heating) the posterior.

Parameters:

input_dim – The input dimension (excluding bias) of this layer.
output_dim – The output dimension of this layer.
num_data – The number of points in the training dataset (used for scaling the KL regulariser).
w_mu – Initial value of the variational mean for weights + bias. If not specified, this defaults to xavier_initialization_numpy for the weights and zero for the bias.
w_sqrt – Initial value of the variational Cholesky of the (co)variance for weights + bias. If not specified, this defaults to 1e-5 * Identity.
activation – The activation function. If not specified, this defaults to the identity.
is_mean_field – Determines whether the approximation to the weight posterior is mean field. Must be consistent with the shape of w_sqrt, if specified.
temperature – The KL loss will be scaled by this factor. Can be used for cooling (< 1.0) or heating (> 1.0) the posterior. As suggested in “How Good is the Bayes Posterior in Deep Neural Networks Really?” by Wenzel et al. (2020) the default value is a cold 1e-4.

build(input_shape: gpflux.types.ShapeType) → None[source]#: Build the variables necessary on first call

predict_samples(inputs: gpflow.base.TensorType, *, num_samples: int | None = None) → tf.Tensor[source]#

Samples from the approximate posterior at N test inputs, with input_dim = D, output_dim = Q.

Parameters:

inputs – The inputs to predict at; shape [N, D].
num_samples – The number of samples S, to draw.

Returns:

Samples, shape [S, N, Q] if S is not None else [N, Q].

call(inputs: gpflow.base.TensorType, training: bool | None = False) → tf.Tensor | gpflow.models.model.MeanAndVariance[source]#: The default behaviour upon calling this layer.

prior_kl() → tf.Tensor[source]#: Returns the KL divergence KL[q(u)∥p(u)] from the prior p(u) = N(0, I) to the variational distribution q(u) = N(w_mu, w_sqrt²).