markovflow.models.sparse_variational

Module containing a model for sparse variational inference, for use with large data sets.

Module Contents

class SparseVariationalGaussianProcess(kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, inducing_points: tf.Tensor, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, num_data: Optional[int] = None, initial_distribution: Optional[markovflow.gauss_markov.GaussMarkovDistribution] = None)[source]

Bases: markovflow.models.models.MarkovFlowSparseModel

Approximate a GaussMarkovDistribution with a general likelihood using a Gaussian posterior. Additionally uses a number of pseudo, or inducing, points to represent the distribution over a typically larger number of data points.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(z\) - the time points of the inducing/pseudo points

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(p(.)\) - the true distribution

  • \(q(.)\) - the variational distribution

Subscript is used to denote dependence for notational convenience, for example \(fₖ === f(k)\).

With a prior generative model comprising a Gauss-Markov distribution, an emission model and an arbitrary likelihood on the emitted variables, these define:

  • \(p(xₖ₊₁| xₖ)\)

  • \(fₖ = H xₖ\)

  • \(p(yₖ | fₖ)\)

As per a VariationalGaussianProcess (VGP) model, we have:

\[ \begin{align}\begin{aligned}&log p(y) >= ℒ(q)\\&ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]\end{aligned}\end{align} \]

…where \(f\) is defined over the entire function space.

Here this reduces to the joint of the evidence lower bound (ELBO) defined over both the data \(x\) and the inducing points \(z\), which we rewrite as:

\[ℒ(q(x, z)) = Σᵢ ∫ log(p(yᵢ | fₓ)) q(fₓ) df - KL[q(f(z)) ‖ p(f(z))]\]

This turns the inference problem into an optimisation problem: find the optimal \(q\).

The first term is the variational expectations and have the same form as a VGP model. However, we must now use use the inducing states to predict the marginals of the variational distribution at the original data points.

The second is the KL from the prior to the approximation, but evaluated at the inducing points.

The key reference is:

@inproceedings{,
    title={Doubly Sparse Variational Gaussian Processes},
    author={Adam, Eleftheriadis, Artemev, Durrande, Hensman},
    booktitle={},
    pages={},
    year={},
    organization={}
}

Note

Since this class extends MarkovFlowSparseModel, it does not depend on input data. Input data is passed during the optimisation step as a tuple of time points and observations.

Parameters
  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • num_data – The total number of observations. (relevant when feeding in external minibatches).

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

elbo(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Calculates the evidence lower bound (ELBO) \(log p(y)\). We rewrite this as:

\[ℒ(q(x, z)) = Σᵢ ∫ log(p(yᵢ | fₓ)) q(fₓ) df - KL[q(s(z)) ‖ p(s(z))]\]

The first term is the ‘variational expectation’ (VE), and has the same form as per a VariationalGaussianProcess (VGP) model. However, we must now use the inducing states to predict the marginals of the variational distribution at the original data points.

The second is the KL divergence from the prior to the approximation, but evaluated at the inducing points.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor (summed over the batch_shape dimension) representing the ELBO.

property time_pointstf.Tensor[source]

Return the time points of the sparse process which essentially are the locations of the inducing points.

Returns

A tensor with shape batch_shape + [num_inducing]. Same as inducing inputs.

property kernelmarkovflow.kernels.SDEKernel[source]

Return the kernel of the GP.

property likelihoodgpflow.likelihoods.Likelihood[source]

Return the likelihood of the GP.

property mean_functionmarkovflow.mean_function.MeanFunction[source]

Return the mean function of the GP.

property dist_pmarkovflow.gauss_markov.GaussMarkovDistribution[source]

Return the prior Gauss-Markov distribution.

property dist_qmarkovflow.gauss_markov.GaussMarkovDistribution[source]

Return the variational distribution as a Gauss-Markov distribution.

property posteriormarkovflow.posterior.PosteriorProcess[source]

Obtain a posterior process for inference.

For this class this is the AnalyticPosteriorProcess built from the variational distribution. This will be a locally optimal variational approximation of the posterior after optimisation.

loss(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Return the loss, which is the negative evidence lower bound (ELBO).

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim].

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor[source]

Compute the log density of the data at the new data points.