gpflux.helpers#
This module contains helper functions for constructing MultioutputKernel,
MultioutputInducingVariables,
MeanFunction, and GPLayer objects.
Module Contents#
- class GPLayer(kernel: gpflow.kernels.MultioutputKernel, inducing_variable: gpflow.inducing_variables.MultioutputInducingVariables, num_data: int, mean_function: gpflow.mean_functions.MeanFunction | None = None, *, num_samples: int | None = None, full_cov: bool = False, full_output_cov: bool = False, num_latent_gps: int = None, whiten: bool = True, name: str | None = None, verbose: bool = True)[source]#
Bases:
tfp.layers.DistributionLambdaA sparse variational multioutput GP layer. This layer holds the kernel, inducing variables and variational distribution, and mean function.
- Parameters:
kernel – The multioutput kernel for this layer.
inducing_variable – The inducing features for this layer.
num_data – The number of points in the training dataset (see
num_data).mean_function –
The mean function that will be applied to the inputs. Default:
Identity.Note
The Identity mean function requires the input and output dimensionality of this layer to be the same. If you want to change the dimensionality in a layer, you may want to provide a
Linearmean function instead.num_samples – The number of samples to draw when converting the
DistributionLambdainto atf.Tensor, see_convert_to_tensor_fn(). Will be stored in thenum_samplesattribute. IfNone(the default), draw a single sample without prefixing the sample shape (seetfp.distributions.Distribution’s sample() method).full_cov – Sets default behaviour of calling this layer (
full_covattribute): IfFalse(the default), only predict marginals (diagonal of covariance) with respect to inputs. IfTrue, predict full covariance over inputs.full_output_cov – Sets default behaviour of calling this layer (
full_output_covattribute): IfFalse(the default), only predict marginals (diagonal of covariance) with respect to outputs. IfTrue, predict full covariance over outputs.num_latent_gps – The number of (latent) GPs in the layer (which can be different from the number of outputs, e.g. with a
LinearCoregionalizationkernel). This is used to determine the size of the variational parametersq_muandq_sqrt. If possible, it is inferred from the kernel and inducing_variable.whiten – If
True(the default), uses the whitened parameterisation of the inducing variables; seewhiten.name – The name of this layer.
verbose – The verbosity mode. Set this parameter to
Trueto show debug information.
- num_data: int#
The number of points in the training dataset. This information is used to obtain the correct scaling between the data-fit and the KL term in the evidence lower bound (ELBO).
- whiten: bool#
This parameter determines the parameterisation of the inducing variables.
If
True, this layer uses the whitened (or non-centred) representation, in which (at the example of inducing point inducing variables)u = f(Z) = cholesky(Kuu) v, and we parameterise an approximate posterior onvasq(v) = N(q_mu, q_sqrt q_sqrtᵀ). The prior onvisp(v) = N(0, I).If
False, this layer uses the non-whitened (or centred) representation, in which we directly parameteriseq(u) = N(q_mu, q_sqrt q_sqrtᵀ). The prior onuisp(u) = N(0, Kuu).
- num_samples: int | None#
The number of samples drawn when coercing the output distribution of this layer to a
tf.Tensor. (See_convert_to_tensor_fn().)
- full_cov: bool#
This parameter determines the behaviour of calling this layer. If
False, only predict or sample marginals (diagonal of covariance) with respect to inputs. IfTrue, predict or sample with the full covariance over the inputs.
- full_output_cov: bool#
This parameter determines the behaviour of calling this layer. If
False, only predict or sample marginals (diagonal of covariance) with respect to outputs. IfTrue, predict or sample with the full covariance over the outputs.
- q_mu: gpflow.Parameter#
The mean of
q(v)orq(u)(depending on whetherwhitened parametrisation is used).
- q_sqrt: gpflow.Parameter#
The lower-triangular Cholesky factor of the covariance of
q(v)orq(u)(depending on whetherwhitened parametrisation is used).
- predict(inputs: gpflow.base.TensorType, *, full_cov: bool = False, full_output_cov: bool = False) Tuple[tf.Tensor, tf.Tensor][source]#
Make a prediction at N test inputs for the Q outputs of this layer, including the mean function contribution.
The covariance and its shape is determined by full_cov and full_output_cov as follows:
(co)variance shape
full_output_cov=Falsefull_output_cov=Truefull_cov=False[N, Q]
[N, Q, Q]
full_cov=True[Q, N, N]
[N, Q, N, Q]
- Parameters:
inputs – The inputs to predict at, with a shape of [N, D], where D is the input dimensionality of this layer.
full_cov – Whether to return full covariance (if
True) or marginal variance (ifFalse, the default) w.r.t. inputs.full_output_cov – Whether to return full covariance (if
True) or marginal variance (ifFalse, the default) w.r.t. outputs.
- Returns:
posterior mean (shape [N, Q]) and (co)variance (shape as above) at test points
- call(inputs: gpflow.base.TensorType, *args: List[Any], **kwargs: Dict[str, Any]) tf.Tensor[source]#
The default behaviour upon calling this layer.
This method calls the
tfp.layers.DistributionLambdasuper-classcallmethod, which constructs atfp.distributions.Distributionfor the predictive distributions at the input points (see_make_distribution_fn()). You can pass this distribution totf.convert_to_tensor, which will return samples from the distribution (see_convert_to_tensor_fn()).This method also adds a layer-specific loss function, given by the KL divergence between this layer and the GP prior (scaled to per-datapoint).
- prior_kl() tf.Tensor[source]#
Returns the KL divergence
KL[q(u)∥p(u)]from the priorp(u)to the variational distributionq(u). If this layer uses thewhitened representation, returnsKL[q(v)∥p(v)].
- _make_distribution_fn(previous_layer_outputs: gpflow.base.TensorType) tfp.distributions.Distribution[source]#
Construct the posterior distributions at the output points of the previous layer, depending on
full_covandfull_output_cov.- Parameters:
previous_layer_outputs – The output from the previous layer, which should be coercible to a
tf.Tensor
- _convert_to_tensor_fn(distribution: tfp.distributions.Distribution) tf.Tensor[source]#
Convert the predictive distributions at the input points (see
_make_distribution_fn()) to a tensor ofnum_samplessamples from that distribution. Whether the samples are correlated or marginal (uncorrelated) depends onfull_covandfull_output_cov.
- sample() gpflux.sampling.sample.Sample[source]#
Todo
TODO: Document this.
- construct_basic_kernel(kernels: gpflow.kernels.Kernel | List[gpflow.kernels.Kernel], output_dim: int | None = None, share_hyperparams: bool = False) gpflow.kernels.MultioutputKernel[source]#
Construct a
MultioutputKernelto use inGPLayers.- Parameters:
kernels – A single kernel or list of
Kernels. - When a single kernel is passed, the same kernel is used for all outputs. Depending onshare_hyperparams, the hyperparameters will be shared across outputs. You must also specifyoutput_dim. - When a list of kernels is passed, each kernel in the list is used on a separate output dimension and agpflow.kernels.SeparateIndependentis returned.output_dim – The number of outputs. This is equal to the number of latent GPs in the
GPLayer. When only a single kernel is specified forkernels, you must also specifyoutput_dim. When a list of kernels is specified forkernels, we assume thatlen(kernels) == output_dim, andoutput_dimis not required.share_hyperparams – If
True, use the type of kernel and the same hyperparameters (variance and lengthscales) for the different outputs. Otherwise, the same type of kernel (Squared-Exponential, Matern12, and so on) is used for the different outputs, but the kernel can have different hyperparameter values for each.
- construct_basic_inducing_variables(num_inducing: int | List[int], input_dim: int, output_dim: int | None = None, share_variables: bool = False, z_init: numpy.ndarray | None = None) gpflow.inducing_variables.MultioutputInducingVariables[source]#
Construct a compatible
MultioutputInducingVariablesto use inGPLayers.- Parameters:
num_inducing – The total number of inducing variables,
M. This parameter can be freely chosen by the user. General advice is to set it as high as possible, but smaller than the number of datapoints. The computational complexity of the layer is cubic inM. If a list is passed, each element in the list specifies the number of inducing variables to use for eachoutput_dim.input_dim – The dimensionality of the input data (or features)
X. Typically, this corresponds toX.shape[-1]. ForInducingPoints, this specifies the dimensionality ofZ.output_dim – The dimensionality of the outputs (or targets)
Y. Typically, this corresponds toY.shape[-1]or the number of latent GPs. The parameter is used to determine the number of inducing variable sets to create when a different set is used for each output. The parameter is redundant whennum_inducingis a list, because the code assumes thatlen(num_inducing) == output_dim.share_variables – If
True, use the same inducing variables for different outputs. Otherwise, create a different set for each output. Set this parameter toFalsewhennum_inducingis a list, because otherwise the two arguments contradict each other. If you set this parameter toTrue, you must also specifyoutput_dim, because that is used to determine the number of inducing variable sets to create.z_init – Raw values to use to initialise
gpflow.inducing_variables.InducingPoints. IfNone(the default), values will be initialised fromN(0, 1). The shape ofz_initdepends on the other input arguments. If a single set of inducing points is used for all outputs (that is, ifshare_variablesisTrue),z_initshould be rank two, with the dimensions[M, input_dim]. If a different set of inducing points is used for the outputs (ithat is, ifnum_inducingis a list, or ifshare_variablesisFalse),z_initshould be a rank three tensor with the dimensions[output_dim, M, input_dim].
- construct_mean_function(X: numpy.ndarray, D_in: int, D_out: int) gpflow.mean_functions.MeanFunction[source]#
Return
gpflow.mean_functions.IdentitywhenD_inandD_outare equal. Otherwise, use the principal components of the inputs matrixXto build aLinearmean function.Note
The returned mean function is set to be untrainable. To change this, use
gpflow.set_trainable().- Parameters:
X – A data array with the shape
[N, D_in]used to determine the principal components to use to create aLinearmean function whenD_in != D_out.D_in – The dimensionality of the input data (or features)
X. Typically, this corresponds toX.shape[-1].D_out – The dimensionality of the outputs (or targets)
Y. Typically, this corresponds toY.shape[-1]or the number of latent GPs in the layer.
- construct_gp_layer(num_data: int, num_inducing: int, input_dim: int, output_dim: int, kernel_class: Type[gpflow.kernels.Stationary] = gpflow.kernels.SquaredExponential, z_init: numpy.ndarray | None = None, name: str | None = None) gpflux.layers.gp_layer.GPLayer[source]#
- Builds a vanilla GP layer with a single kernel shared among all outputs,
shared inducing point variables and zero mean function.
- Parameters:
num_data – total number of datapoints in the dataset, N. Typically corresponds to
X.shape[0] == len(X).num_inducing – total number of inducing variables, M. This parameter can be freely chosen by the user. General advice is to pick it as high as possible, but smaller than N. The computational complexity of the layer is cubic in M.
input_dim – dimensionality of the input data (or features) X. Typically, this corresponds to
X.shape[-1].output_dim – The dimensionality of the outputs (or targets)
Y. Typically, this corresponds toY.shape[-1].kernel_class – The kernel class used by the layer. This can be as simple as
gpflow.kernels.SquaredExponential, or more complex, for example,lambda **_: gpflow.kernels.Linear() + gpflow.kernels.Periodic(). It will be passed alengthscaleskeyword argument.z_init – The initial value for the inducing variable inputs.
name – The name for the GP layer.
- make_dataclass_from_class(dataclass: Any, instance: object, **updates: object) Any[source]#
Take a regular object
instancewith a superset of fields for adataclasses.dataclass(@dataclass-decorated class), and return an instance of the dataclass. Theinstancehas all of the dataclass’s fields but might also have more.key=valuekeyword arguments supersede the fields ininstance.
- xavier_initialization_numpy(input_dim: int, output_dim: int) numpy.ndarray[source]#
Generate initial weights for a neural network layer with the given input and output dimensionality using the Xavier Glorot normal initialiser. From:
Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010.
Draw samples from a normal distribution centred on \(0\) with standard deviation \(\sqrt(2 / (\text{input_dim} + \text{output_dim}))\).