gpflux.layers.gp_layer#
This module provides GPLayer
, which implements a Sparse Variational
Multioutput Gaussian Process as a Keras Layer
.
Module Contents#
- exception GPLayerIncompatibilityException[source]#
Bases:
Exception
This exception is raised when
GPLayer
is misconfigured. This can be caused by multiple reasons, but common misconfigurations are:Incompatible or wrong type of
Kernel
,InducingVariables
and/orMeanFunction
.Incompatible number of latent GPs specified.
Initialize self. See help(type(self)) for accurate signature.
- _cholesky_with_jitter(cov: gpflow.base.TensorType) tf.Tensor [source]#
Compute the Cholesky of the covariance, adding jitter (determined by
gpflow.default_jitter()
) to the diagonal to improve stability.- Parameters:
cov – full covariance with shape
[..., N, D, D]
.
- verify_compatibility(kernel: gpflow.kernels.MultioutputKernel, mean_function: gpflow.mean_functions.MeanFunction, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables) Tuple[int, int] [source]#
Checks that the arguments are all compatible with each other for use in a
GPLayer
.- Parameters:
kernel – The multioutput kernel for the layer.
inducing_variable – The inducing features for the layer.
mean_function – The mean function applied to the inputs.
- Raises:
GPLayerIncompatibilityException – If an incompatibility is detected.
- Returns:
number of inducing variables and number of latent GPs
- class Sample[source]#
Bases:
abc.ABC
This class represents a sample from a GP that you can evaluate by using the
__call__
at new locations within the support of the GP.Importantly, the same function draw (sample) is evaluated when calling it multiple times. This property is called consistency. Achieving consistency for vanilla GPs is costly because it scales cubically with the number of evaluation points, but works with any kernel. It is implemented in
_efficient_sample_conditional_gaussian()
. ForKernelWithFeatureDecomposition
, the more efficient approach following Wilson et al. [WBT+20] is implemented in_efficient_sample_matheron_rule()
.See the tutorial notebooks Efficient sampling and Weight Space Approximation with Random Fourier Features for an in-depth overview.
- abstract __call__(X: gpflow.base.TensorType) tf.Tensor [source]#
Return the evaluation of the GP sample \(f(X)\) for \(f \sim GP(0, k)\).
- Parameters:
X – The inputs, a tensor with the shape
[N, D]
, whereD
is the input dimensionality.- Returns:
Function values, a tensor with the shape
[N, P]
, whereP
is the output dimensionality.
- class GPLayer(kernel: gpflow.kernels.MultioutputKernel, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables, num_data: int, mean_function: gpflow.mean_functions.MeanFunction | None = None, *, num_samples: int | None = None, full_cov: bool = False, full_output_cov: bool = False, num_latent_gps: int = None, whiten: bool = True, name: str | None = None, verbose: bool = True)[source]#
Bases:
tfp.layers.DistributionLambda
A sparse variational multioutput GP layer. This layer holds the kernel, inducing variables and variational distribution, and mean function.
- Parameters:
kernel – The multioutput kernel for this layer.
inducing_variable – The inducing features for this layer.
num_data – The number of points in the training dataset (see
num_data
).mean_function –
The mean function that will be applied to the inputs. Default:
Identity
.Note
The Identity mean function requires the input and output dimensionality of this layer to be the same. If you want to change the dimensionality in a layer, you may want to provide a
Linear
mean function instead.num_samples – The number of samples to draw when converting the
DistributionLambda
into atf.Tensor
, see_convert_to_tensor_fn()
. Will be stored in thenum_samples
attribute. IfNone
(the default), draw a single sample without prefixing the sample shape (seetfp.distributions.Distribution
’s sample() method).full_cov – Sets default behaviour of calling this layer (
full_cov
attribute): IfFalse
(the default), only predict marginals (diagonal of covariance) with respect to inputs. IfTrue
, predict full covariance over inputs.full_output_cov – Sets default behaviour of calling this layer (
full_output_cov
attribute): IfFalse
(the default), only predict marginals (diagonal of covariance) with respect to outputs. IfTrue
, predict full covariance over outputs.num_latent_gps – The number of (latent) GPs in the layer (which can be different from the number of outputs, e.g. with a
LinearCoregionalization
kernel). This is used to determine the size of the variational parametersq_mu
andq_sqrt
. If possible, it is inferred from the kernel and inducing_variable.whiten – If
True
(the default), uses the whitened parameterisation of the inducing variables; seewhiten
.name – The name of this layer.
verbose – The verbosity mode. Set this parameter to
True
to show debug information.
- num_data: int[source]#
The number of points in the training dataset. This information is used to obtain the correct scaling between the data-fit and the KL term in the evidence lower bound (ELBO).
- whiten: bool[source]#
This parameter determines the parameterisation of the inducing variables.
If
True
, this layer uses the whitened (or non-centred) representation, in which (at the example of inducing point inducing variables)u = f(Z) = cholesky(Kuu) v
, and we parameterise an approximate posterior onv
asq(v) = N(q_mu, q_sqrt q_sqrtᵀ)
. The prior onv
isp(v) = N(0, I)
.If
False
, this layer uses the non-whitened (or centred) representation, in which we directly parameteriseq(u) = N(q_mu, q_sqrt q_sqrtᵀ)
. The prior onu
isp(u) = N(0, Kuu)
.
- num_samples: int | None[source]#
The number of samples drawn when coercing the output distribution of this layer to a
tf.Tensor
. (See_convert_to_tensor_fn()
.)
- full_cov: bool[source]#
This parameter determines the behaviour of calling this layer. If
False
, only predict or sample marginals (diagonal of covariance) with respect to inputs. IfTrue
, predict or sample with the full covariance over the inputs.
- full_output_cov: bool[source]#
This parameter determines the behaviour of calling this layer. If
False
, only predict or sample marginals (diagonal of covariance) with respect to outputs. IfTrue
, predict or sample with the full covariance over the outputs.
- q_mu: gpflow.Parameter[source]#
The mean of
q(v)
orq(u)
(depending on whetherwhiten
ed parametrisation is used).
- q_sqrt: gpflow.Parameter[source]#
The lower-triangular Cholesky factor of the covariance of
q(v)
orq(u)
(depending on whetherwhiten
ed parametrisation is used).
- predict(inputs: gpflow.base.TensorType, *, full_cov: bool = False, full_output_cov: bool = False) Tuple[tf.Tensor, tf.Tensor] [source]#
Make a prediction at N test inputs for the Q outputs of this layer, including the mean function contribution.
The covariance and its shape is determined by full_cov and full_output_cov as follows:
(co)variance shape
full_output_cov=False
full_output_cov=True
full_cov=False
[N, Q]
[N, Q, Q]
full_cov=True
[Q, N, N]
[N, Q, N, Q]
- Parameters:
inputs – The inputs to predict at, with a shape of [N, D], where D is the input dimensionality of this layer.
full_cov – Whether to return full covariance (if
True
) or marginal variance (ifFalse
, the default) w.r.t. inputs.full_output_cov – Whether to return full covariance (if
True
) or marginal variance (ifFalse
, the default) w.r.t. outputs.
- Returns:
posterior mean (shape [N, Q]) and (co)variance (shape as above) at test points
- call(inputs: gpflow.base.TensorType, *args: List[Any], **kwargs: Dict[str, Any]) tf.Tensor [source]#
The default behaviour upon calling this layer.
This method calls the
tfp.layers.DistributionLambda
super-classcall
method, which constructs atfp.distributions.Distribution
for the predictive distributions at the input points (see_make_distribution_fn()
). You can pass this distribution totf.convert_to_tensor
, which will return samples from the distribution (see_convert_to_tensor_fn()
).This method also adds a layer-specific loss function, given by the KL divergence between this layer and the GP prior (scaled to per-datapoint).
- prior_kl() tf.Tensor [source]#
Returns the KL divergence
KL[q(u)∥p(u)]
from the priorp(u)
to the variational distributionq(u)
. If this layer uses thewhiten
ed representation, returnsKL[q(v)∥p(v)]
.
- _make_distribution_fn(previous_layer_outputs: gpflow.base.TensorType) tfp.distributions.Distribution [source]#
Construct the posterior distributions at the output points of the previous layer, depending on
full_cov
andfull_output_cov
.- Parameters:
previous_layer_outputs – The output from the previous layer, which should be coercible to a
tf.Tensor
- _convert_to_tensor_fn(distribution: tfp.distributions.Distribution) tf.Tensor [source]#
Convert the predictive distributions at the input points (see
_make_distribution_fn()
) to a tensor ofnum_samples
samples from that distribution. Whether the samples are correlated or marginal (uncorrelated) depends onfull_cov
andfull_output_cov
.
- sample() gpflux.sampling.sample.Sample [source]#
Todo
TODO: Document this.