gpflux.architectures.constant_input_dim_deep_gp#
This module provides build_constant_input_dim_deep_gp()
to build a Deep GP of
arbitrary depth where each hidden layer has the same input dimensionality as the data.
Module Contents#
- construct_basic_inducing_variables(num_inducing: int | List[int], input_dim: int, output_dim: int | None = None, share_variables: bool = False, z_init: numpy.ndarray | None = None) gpflow.inducing_variables.MultioutputInducingVariables [source]#
Construct a compatible
MultioutputInducingVariables
to use inGPLayer
s.- Parameters:
num_inducing – The total number of inducing variables,
M
. This parameter can be freely chosen by the user. General advice is to set it as high as possible, but smaller than the number of datapoints. The computational complexity of the layer is cubic inM
. If a list is passed, each element in the list specifies the number of inducing variables to use for eachoutput_dim
.input_dim – The dimensionality of the input data (or features)
X
. Typically, this corresponds toX.shape[-1]
. ForInducingPoints
, this specifies the dimensionality ofZ
.output_dim – The dimensionality of the outputs (or targets)
Y
. Typically, this corresponds toY.shape[-1]
or the number of latent GPs. The parameter is used to determine the number of inducing variable sets to create when a different set is used for each output. The parameter is redundant whennum_inducing
is a list, because the code assumes thatlen(num_inducing) == output_dim
.share_variables – If
True
, use the same inducing variables for different outputs. Otherwise, create a different set for each output. Set this parameter toFalse
whennum_inducing
is a list, because otherwise the two arguments contradict each other. If you set this parameter toTrue
, you must also specifyoutput_dim
, because that is used to determine the number of inducing variable sets to create.z_init – Raw values to use to initialise
gpflow.inducing_variables.InducingPoints
. IfNone
(the default), values will be initialised fromN(0, 1)
. The shape ofz_init
depends on the other input arguments. If a single set of inducing points is used for all outputs (that is, ifshare_variables
isTrue
),z_init
should be rank two, with the dimensions[M, input_dim]
. If a different set of inducing points is used for the outputs (ithat is, ifnum_inducing
is a list, or ifshare_variables
isFalse
),z_init
should be a rank three tensor with the dimensions[output_dim, M, input_dim]
.
- construct_basic_kernel(kernels: gpflow.kernels.Kernel | List[gpflow.kernels.Kernel], output_dim: int | None = None, share_hyperparams: bool = False) gpflow.kernels.MultioutputKernel [source]#
Construct a
MultioutputKernel
to use inGPLayer
s.- Parameters:
kernels – A single kernel or list of
Kernel
s. - When a single kernel is passed, the same kernel is used for all outputs. Depending onshare_hyperparams
, the hyperparameters will be shared across outputs. You must also specifyoutput_dim
. - When a list of kernels is passed, each kernel in the list is used on a separate output dimension and agpflow.kernels.SeparateIndependent
is returned.output_dim – The number of outputs. This is equal to the number of latent GPs in the
GPLayer
. When only a single kernel is specified forkernels
, you must also specifyoutput_dim
. When a list of kernels is specified forkernels
, we assume thatlen(kernels) == output_dim
, andoutput_dim
is not required.share_hyperparams – If
True
, use the type of kernel and the same hyperparameters (variance and lengthscales) for the different outputs. Otherwise, the same type of kernel (Squared-Exponential, Matern12, and so on) is used for the different outputs, but the kernel can have different hyperparameter values for each.
- construct_mean_function(X: numpy.ndarray, D_in: int, D_out: int) gpflow.mean_functions.MeanFunction [source]#
Return
gpflow.mean_functions.Identity
whenD_in
andD_out
are equal. Otherwise, use the principal components of the inputs matrixX
to build aLinear
mean function.Note
The returned mean function is set to be untrainable. To change this, use
gpflow.set_trainable()
.- Parameters:
X – A data array with the shape
[N, D_in]
used to determine the principal components to use to create aLinear
mean function whenD_in != D_out
.D_in – The dimensionality of the input data (or features)
X
. Typically, this corresponds toX.shape[-1]
.D_out – The dimensionality of the outputs (or targets)
Y
. Typically, this corresponds toY.shape[-1]
or the number of latent GPs in the layer.
- class GPLayer(kernel: gpflow.kernels.MultioutputKernel, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables, num_data: int, mean_function: gpflow.mean_functions.MeanFunction | None = None, *, num_samples: int | None = None, full_cov: bool = False, full_output_cov: bool = False, num_latent_gps: int = None, whiten: bool = True, name: str | None = None, verbose: bool = True)[source]#
Bases:
tfp.layers.DistributionLambda
A sparse variational multioutput GP layer. This layer holds the kernel, inducing variables and variational distribution, and mean function.
- Parameters:
kernel – The multioutput kernel for this layer.
inducing_variable – The inducing features for this layer.
num_data – The number of points in the training dataset (see
num_data
).mean_function –
The mean function that will be applied to the inputs. Default:
Identity
.Note
The Identity mean function requires the input and output dimensionality of this layer to be the same. If you want to change the dimensionality in a layer, you may want to provide a
Linear
mean function instead.num_samples – The number of samples to draw when converting the
DistributionLambda
into atf.Tensor
, see_convert_to_tensor_fn()
. Will be stored in thenum_samples
attribute. IfNone
(the default), draw a single sample without prefixing the sample shape (seetfp.distributions.Distribution
’s sample() method).full_cov – Sets default behaviour of calling this layer (
full_cov
attribute): IfFalse
(the default), only predict marginals (diagonal of covariance) with respect to inputs. IfTrue
, predict full covariance over inputs.full_output_cov – Sets default behaviour of calling this layer (
full_output_cov
attribute): IfFalse
(the default), only predict marginals (diagonal of covariance) with respect to outputs. IfTrue
, predict full covariance over outputs.num_latent_gps – The number of (latent) GPs in the layer (which can be different from the number of outputs, e.g. with a
LinearCoregionalization
kernel). This is used to determine the size of the variational parametersq_mu
andq_sqrt
. If possible, it is inferred from the kernel and inducing_variable.whiten – If
True
(the default), uses the whitened parameterisation of the inducing variables; seewhiten
.name – The name of this layer.
verbose – The verbosity mode. Set this parameter to
True
to show debug information.
- num_data: int#
The number of points in the training dataset. This information is used to obtain the correct scaling between the data-fit and the KL term in the evidence lower bound (ELBO).
- whiten: bool#
This parameter determines the parameterisation of the inducing variables.
If
True
, this layer uses the whitened (or non-centred) representation, in which (at the example of inducing point inducing variables)u = f(Z) = cholesky(Kuu) v
, and we parameterise an approximate posterior onv
asq(v) = N(q_mu, q_sqrt q_sqrtᵀ)
. The prior onv
isp(v) = N(0, I)
.If
False
, this layer uses the non-whitened (or centred) representation, in which we directly parameteriseq(u) = N(q_mu, q_sqrt q_sqrtᵀ)
. The prior onu
isp(u) = N(0, Kuu)
.
- num_samples: int | None#
The number of samples drawn when coercing the output distribution of this layer to a
tf.Tensor
. (See_convert_to_tensor_fn()
.)
- full_cov: bool#
This parameter determines the behaviour of calling this layer. If
False
, only predict or sample marginals (diagonal of covariance) with respect to inputs. IfTrue
, predict or sample with the full covariance over the inputs.
- full_output_cov: bool#
This parameter determines the behaviour of calling this layer. If
False
, only predict or sample marginals (diagonal of covariance) with respect to outputs. IfTrue
, predict or sample with the full covariance over the outputs.
- q_mu: gpflow.Parameter#
The mean of
q(v)
orq(u)
(depending on whetherwhiten
ed parametrisation is used).
- q_sqrt: gpflow.Parameter#
The lower-triangular Cholesky factor of the covariance of
q(v)
orq(u)
(depending on whetherwhiten
ed parametrisation is used).
- predict(inputs: gpflow.base.TensorType, *, full_cov: bool = False, full_output_cov: bool = False) Tuple[tf.Tensor, tf.Tensor] [source]#
Make a prediction at N test inputs for the Q outputs of this layer, including the mean function contribution.
The covariance and its shape is determined by full_cov and full_output_cov as follows:
(co)variance shape
full_output_cov=False
full_output_cov=True
full_cov=False
[N, Q]
[N, Q, Q]
full_cov=True
[Q, N, N]
[N, Q, N, Q]
- Parameters:
inputs – The inputs to predict at, with a shape of [N, D], where D is the input dimensionality of this layer.
full_cov – Whether to return full covariance (if
True
) or marginal variance (ifFalse
, the default) w.r.t. inputs.full_output_cov – Whether to return full covariance (if
True
) or marginal variance (ifFalse
, the default) w.r.t. outputs.
- Returns:
posterior mean (shape [N, Q]) and (co)variance (shape as above) at test points
- call(inputs: gpflow.base.TensorType, *args: List[Any], **kwargs: Dict[str, Any]) tf.Tensor [source]#
The default behaviour upon calling this layer.
This method calls the
tfp.layers.DistributionLambda
super-classcall
method, which constructs atfp.distributions.Distribution
for the predictive distributions at the input points (see_make_distribution_fn()
). You can pass this distribution totf.convert_to_tensor
, which will return samples from the distribution (see_convert_to_tensor_fn()
).This method also adds a layer-specific loss function, given by the KL divergence between this layer and the GP prior (scaled to per-datapoint).
- prior_kl() tf.Tensor [source]#
Returns the KL divergence
KL[q(u)∥p(u)]
from the priorp(u)
to the variational distributionq(u)
. If this layer uses thewhiten
ed representation, returnsKL[q(v)∥p(v)]
.
- _make_distribution_fn(previous_layer_outputs: gpflow.base.TensorType) tfp.distributions.Distribution [source]#
Construct the posterior distributions at the output points of the previous layer, depending on
full_cov
andfull_output_cov
.- Parameters:
previous_layer_outputs – The output from the previous layer, which should be coercible to a
tf.Tensor
- _convert_to_tensor_fn(distribution: tfp.distributions.Distribution) tf.Tensor [source]#
Convert the predictive distributions at the input points (see
_make_distribution_fn()
) to a tensor ofnum_samples
samples from that distribution. Whether the samples are correlated or marginal (uncorrelated) depends onfull_cov
andfull_output_cov
.
- sample() gpflux.sampling.sample.Sample [source]#
Todo
TODO: Document this.
- class LikelihoodLayer(likelihood: gpflow.likelihoods.Likelihood)[source]#
Bases:
gpflux.layers.trackable_layer.TrackableLayer
A Keras layer that wraps a GPflow
Likelihood
. This layer expects atfp.distributions.MultivariateNormalDiag
as its input, describingq(f)
. When training, calling this class computes the negative variational expectation \(-\mathbb{E}_{q(f)}[\log p(y|f)]\) and adds it as a layer loss. When not training, it computes the mean and variance ofy
underq(f)
usingpredict_mean_and_var()
.Note
Use either this
LikelihoodLayer
(together withgpflux.models.DeepGP
) orLikelihoodLoss
(e.g. together with atf.keras.Sequential
model). Do not use both at once because this would add the loss twice.- call(inputs: tfp.distributions.MultivariateNormalDiag, targets: gpflow.base.TensorType | None = None, training: bool = None) LikelihoodOutputs [source]#
When training (
training=True
), this method computes variational expectations (data-fit loss) and adds this information as a layer loss. When testing (the default), it computes the posterior mean and variance ofy
.- Parameters:
inputs – The output distribution of the previous layer. This is currently expected to be a
MultivariateNormalDiag
; that is, the precedingGPLayer
should havefull_cov=full_output_cov=False
.- Returns:
a
LikelihoodOutputs
tuple with the mean and variance off
and, if not training, the mean and variance ofy
.
Todo
Turn this layer into a
DistributionLambda
as well and return the correctDistribution
instead of a tuple containing mean and variance only.
- class DeepGP(f_layers: List[gpflow.keras.tf_keras.layers.Layer], likelihood: gpflux.layers.LikelihoodLayer | gpflow.likelihoods.Likelihood, *, input_dim: int | None = None, target_dim: int | None = None, default_model_class: Type[gpflow.keras.tf_keras.Model] = tf_keras.Model, num_data: int | None = None)[source]#
Bases:
gpflow.base.Module
This class combines a sequential function model
f(x) = fₙ(⋯ (f₂(f₁(x))))
and a likelihoodp(y|f)
.Layers might depend on both inputs x and targets y during training by inheriting from
LayerWithObservations
; those will be passed the argumentobservations=[inputs, targets]
.When data is used with methods in this class (e.g.
predict_f()
method), it needs to be withdtype
corresponding to GPflow’s default dtype as indefault_float()
.Note
This class is not a
tf.keras.Model
subclass itself. To access Keras features, call eitheras_training_model()
oras_prediction_model()
(depending on the use-case) to create atf.keras.Model
instance. See the method docstrings for more details.- Parameters:
f_layers – The layers
[f₁, f₂, …, fₙ]
describing the latent functionf(x) = fₙ(⋯ (f₂(f₁(x))))
.likelihood – The layer for the likelihood
p(y|f)
. If this is a GPflow likelihood, it will be wrapped in aLikelihoodLayer
. Alternatively, you can provide aLikelihoodLayer
explicitly.input_dim – The input dimensionality.
target_dim – The target dimensionality.
default_model_class – The default for the model_class argument of
as_training_model()
andas_prediction_model()
; see thedefault_model_class
attribute.num_data – The number of points in the training dataset; see the
num_data
attribute. If you do not specify a value for this parameter explicitly, it is automatically detected from thenum_data
attribute in the GP layers.
- f_layers: List[gpflow.keras.tf_keras.layers.Layer]#
A list of all layers in this DeepGP (just
likelihood_layer
is separate).
- likelihood_layer: gpflux.layers.LikelihoodLayer#
The likelihood layer.
- default_model_class: Type[gpflow.keras.tf_keras.Model]#
The default for the model_class argument of
as_training_model()
andas_prediction_model()
. This must have the same semantics astf.keras.Model
, that is, it must accept a list of inputs and an output. This could betf.keras.Model
itself orgpflux.optimization.NatGradModel
(but not, for example,tf.keras.Sequential
).
- num_data: int#
The number of points in the training dataset. This information is used to obtain correct scaling between the data-fit and the KL term in the evidence lower bound (
elbo()
).
- static _validate_num_data(f_layers: List[gpflow.keras.tf_keras.layers.Layer], num_data: int | None = None) int [source]#
Check that the
num_data
attributes of all layers in f_layers are consistent with each other and with the (optional) num_data argument.- Returns:
The validated number of datapoints.
- static _validate_dtype(x: gpflow.base.TensorType) None [source]#
Check that data
x
is of correctdtype
, corresponding to GPflow’s default dtype as defined bydefault_float()
.- Raises:
ValueError – If
x
is of incorrectdtype
.
- _evaluate_deep_gp(inputs: gpflow.base.TensorType, targets: gpflow.base.TensorType | None, training: bool | None = None) tf.Tensor [source]#
Evaluate
f(x) = fₙ(⋯ (f₂(f₁(x))))
on the inputs argument.Layers that inherit from
LayerWithObservations
are passed the additional keyword argumentobservations=[inputs, targets]
if targets contains a value, orobservations=None
when targets isNone
.
- _evaluate_likelihood(f_outputs: gpflow.base.TensorType, targets: gpflow.base.TensorType | None, training: bool | None = None) tf.Tensor [source]#
Call the
likelihood_layer
on f_outputs, which adds the corresponding layer loss when training.
- predict_f(inputs: gpflow.base.TensorType) Tuple[tf.Tensor, tf.Tensor] [source]#
- Returns:
The mean and variance (not the scale!) of
f
, for compatibility with GPflow models.- Raises:
ValueError – If
x
is of incorrectdtype
.
Note
This method does not support
full_cov
orfull_output_cov
.
- elbo(data: Tuple[gpflow.base.TensorType, gpflow.base.TensorType]) tf.Tensor [source]#
- Returns:
The ELBO (not the per-datapoint loss!), for compatibility with GPflow models.
- as_training_model(model_class: Type[gpflow.keras.tf_keras.Model] | None = None) gpflow.keras.tf_keras.Model [source]#
Construct a
tf.keras.Model
instance that requires you to provide bothinputs
andtargets
to its call. This information is required for training the model, because thetargets
need to be passed to thelikelihood_layer
(and toLayerWithObservations
instances such asLatentVariableLayer
s, if present).When compiling the returned model, do not provide any additional losses (this is handled by the
likelihood_layer
).Train with
model.compile(optimizer) # do NOT pass a loss here model.fit({"inputs": X, "targets": Y}, ...)
See Keras’s Endpoint layer pattern for more details.
Note
Use
as_prediction_model
if you want only to predict, and do not want to pass in a dummy array for the targets.- Parameters:
model_class – The model class to use; overrides
default_model_class
.
- as_prediction_model(model_class: Type[gpflow.keras.tf_keras.Model] | None = None) gpflow.keras.tf_keras.Model [source]#
Construct a
tf.keras.Model
instance that requires onlyinputs
, which means you do not have to provide dummy target values when predicting at test points.Predict with
model.predict(Xtest, ...)
Note
The returned model will not support training; for that, use
as_training_model
.- Parameters:
model_class – The model class to use; overrides
default_model_class
.
- class Config[source]#
The configuration used by
build_constant_input_dim_deep_gp()
.- num_inducing: int[source]#
The number of inducing variables, M. The Deep GP uses the same number of inducing variables in each layer.
- inner_layer_qsqrt_factor: float[source]#
A multiplicative factor used to rescale the hidden layers’
q_sqrt
. Typically this value is chosen to be small (e.g., 1e-5) to reduce noise at the start of training.
- likelihood_noise_variance: float[source]#
The variance of the
Gaussian
likelihood that is used by the Deep GP.
- whiten: bool = True[source]#
Determines the parameterisation of the inducing variables. If
True
, :math:p(u) = N(0, I)
, otherwise :math:p(u) = N(0, K_{uu})
. .. seealso::gpflux.layers.GPLayer.whiten
- _construct_kernel(input_dim: int, is_last_layer: bool) gpflow.kernels.SquaredExponential [source]#
Return a
gpflow.kernels.SquaredExponential
kernel with ARD lengthscales set to 2 and a small kernel variance of 1e-6 if the kernel is part of a hidden layer; otherwise, the kernel variance is set to 1.0.- Parameters:
input_dim – The input dimensionality of the layer.
is_last_layer – Whether the kernel is part of the last layer in the Deep GP.
- build_constant_input_dim_deep_gp(X: numpy.ndarray, num_layers: int, config: Config) gpflux.models.DeepGP [source]#
Build a Deep GP consisting of
num_layers
GPLayer
s. All the hidden layers have the same input dimension as the data, that is,X.shape[1]
.The architecture is largely based on Salimbeni and Deisenroth [SD17], with the most notable difference being that we keep the hidden dimension equal to the input dimensionality of the data.
Note
This architecture might be slow for high-dimensional data.
Note
This architecture assumes a
Gaussian
likelihood for regression tasks. Specify a different likelihood for performing other tasks such as classification.- Parameters:
X – The training input data, used to retrieve the number of datapoints and the input dimension and to initialise the inducing point locations using k-means. A tensor of rank two with the dimensions
[num_data, input_dim]
.num_layers – The number of layers in the Deep GP.
config – The configuration for (hyper)parameters. See
Config
for details.