markovflow.models.variational

Module containing a model for variational inference, for GP classification.

Module Contents

class VariationalGaussianProcess(input_data: Tuple[tf.Tensor, tf.Tensor], kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, initial_distribution: Optional[markovflow.gauss_markov.GaussMarkovDistribution] = None)[source]

Bases: markovflow.models.models.MarkovFlowModel

Approximates a GaussMarkovDistribution with a general likelihood using a Gaussian posterior.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(p(.)\) - the true distribution

  • \(q(.)\) - the variational distribution

Subscript is used to denote dependence for notational convenience, for example \(fₖ === f(k)\).

With a prior generative model comprising a Gauss-Markov distribution, an emission model and an arbitrary likelihood on the emitted variables, these define:

  • \(p(xₖ₊₁| xₖ)\)

  • \(fₖ = H xₖ\)

  • \(p(yₖ | fₖ)\)

We would like to approximate the posterior of this generative model with a parametric model \(q\), comprising of the same distribution as the prior.

To approximate the posterior, we maximise the evidence lower bound (ELBO) \(ℒ\) with respect to the parameters of the variational distribution, since:

\[log p(y) = ℒ(q) + KL[q ‖ p(f | y)]\]

…where:

\[ℒ(q) = ∫ log(p(f, y) / q(f)) q(f) df\]

Since the last term is non-negative, the ELBO provides a lower bound to the log-likelihood of the model. This bound is exact when \(KL[q ‖ p(f | y)] = 0\); that is, our approximation is sufficiently flexible to capture the true posterior.

This turns the inference into an optimisation problem: find the optional \(q\).

To calculate the ELBO, we rewrite it as:

\[ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]\]

The first term is the ‘variational expectation’ of the model likelihood; the second is the KL from the prior to the approximation.

Parameters
  • input_data – A tuple of (time_points, observations) containing the observed data: time points of observations, with shape batch_shape + [num_data], observations with shape batch_shape + [num_data, observation_dim].

  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

elbo()tf.Tensor[source]

Calculate the evidence lower bound (ELBO) \(log p(y)\). We rewrite the ELBO as:

\[ℒ(q(x)) = Σᵢ ∫ log(p(yᵢ | fₓ)) q(fₓ) df - KL[q(sₓ) ‖ p(sₓ)]\]

The first term is the ‘variational expectation’ (VE); the second is the KL divergence from the prior to the approximation.

Returns

A scalar tensor (summed over the batch_shape dimension) representing the ELBO.

property time_pointstf.Tensor[source]

Return the time points of our observations.

Returns

A tensor with shape batch_shape + [num_data].

property observationstf.Tensor[source]

Return the observations.

Returns

A tensor with shape batch_shape + [num_data, observation_dim].

property kernelmarkovflow.kernels.SDEKernel[source]

Return the kernel of the GP.

property likelihoodgpflow.likelihoods.Likelihood[source]

Return the likelihood of the GP.

property mean_functionmarkovflow.mean_function.MeanFunction[source]

Return the mean function of the GP.

property dist_pmarkovflow.gauss_markov.GaussMarkovDistribution[source]

Return the prior Gauss-Markov distribution.

property dist_qmarkovflow.gauss_markov.GaussMarkovDistribution[source]

Return the variational distribution as a Gauss-Markov distribution.

property posteriormarkovflow.posterior.PosteriorProcess[source]

Obtain a posterior process for inference.

For this class this is the AnalyticPosteriorProcess built from the variational distribution. This will be a locally optimal variational approximation of the posterior after optimisation.

loss()tf.Tensor[source]

Return the loss, which is the negative ELBO.