markovflow.models.variational
Module containing a model for variational inference, for GP classification.
VariationalGaussianProcess
Bases: markovflow.models.models.MarkovFlowModel
markovflow.models.models.MarkovFlowModel
Approximates a GaussMarkovDistribution with a general likelihood using a Gaussian posterior.
GaussMarkovDistribution
The following notation is used:
\(x\) - the time points of the training data \(y\) - observations corresponding to time points \(x\) \(s(.)\) - the latent state of the Markov chain \(f(.)\) - the noise free predictions of the model \(p(y | f)\) - the likelihood \(p(.)\) - the true distribution \(q(.)\) - the variational distribution
\(x\) - the time points of the training data
\(y\) - observations corresponding to time points \(x\)
\(s(.)\) - the latent state of the Markov chain
\(f(.)\) - the noise free predictions of the model
\(p(y | f)\) - the likelihood
\(p(.)\) - the true distribution
\(q(.)\) - the variational distribution
Subscript is used to denote dependence for notational convenience, for example \(fₖ === f(k)\).
With a prior generative model comprising a Gauss-Markov distribution, an emission model and an arbitrary likelihood on the emitted variables, these define:
\(p(xₖ₊₁| xₖ)\) \(fₖ = H xₖ\) \(p(yₖ | fₖ)\)
\(p(xₖ₊₁| xₖ)\)
\(fₖ = H xₖ\)
\(p(yₖ | fₖ)\)
We would like to approximate the posterior of this generative model with a parametric model \(q\), comprising of the same distribution as the prior.
To approximate the posterior, we maximise the evidence lower bound (ELBO) \(ℒ\) with respect to the parameters of the variational distribution, since:
…where:
Since the last term is non-negative, the ELBO provides a lower bound to the log-likelihood of the model. This bound is exact when \(KL[q ‖ p(f | y)] = 0\); that is, our approximation is sufficiently flexible to capture the true posterior.
This turns the inference into an optimisation problem: find the optional \(q\).
To calculate the ELBO, we rewrite it as:
The first term is the ‘variational expectation’ of the model likelihood; the second is the KL from the prior to the approximation.
input_data – A tuple of (time_points, observations) containing the observed data: time points of observations, with shape batch_shape + [num_data], observations with shape batch_shape + [num_data, observation_dim].
(time_points, observations)
batch_shape + [num_data]
batch_shape + [num_data, observation_dim]
kernel – A kernel that defines a prior over functions.
likelihood – A likelihood.
mean_function – The mean function for the GP. Defaults to no mean function.
initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].
batch_shape + [num_inducing]
elbo
Calculate the evidence lower bound (ELBO) \(log p(y)\). We rewrite the ELBO as:
The first term is the ‘variational expectation’ (VE); the second is the KL divergence from the prior to the approximation.
A scalar tensor (summed over the batch_shape dimension) representing the ELBO.
time_points
Return the time points of our observations.
A tensor with shape batch_shape + [num_data].
observations
Return the observations.
A tensor with shape batch_shape + [num_data, observation_dim].
kernel
Return the kernel of the GP.
likelihood
Return the likelihood of the GP.
mean_function
Return the mean function of the GP.
dist_p
Return the prior Gauss-Markov distribution.
dist_q
Return the variational distribution as a Gauss-Markov distribution.
posterior
Obtain a posterior process for inference.
For this class this is the AnalyticPosteriorProcess built from the variational distribution. This will be a locally optimal variational approximation of the posterior after optimisation.
AnalyticPosteriorProcess
loss
Return the loss, which is the negative ELBO.