markovflow.models.sparse_variational
Module containing a model for sparse variational inference, for use with large data sets.
SparseVariationalGaussianProcess
Bases: markovflow.models.models.MarkovFlowSparseModel
markovflow.models.models.MarkovFlowSparseModel
Approximate a GaussMarkovDistribution with a general likelihood using a Gaussian posterior. Additionally uses a number of pseudo, or inducing, points to represent the distribution over a typically larger number of data points.
GaussMarkovDistribution
The following notation is used:
\(x\) - the time points of the training data \(z\) - the time points of the inducing/pseudo points \(y\) - observations corresponding to time points \(x\) \(s(.)\) - the latent state of the Markov chain \(f(.)\) - the noise free predictions of the model \(p(y | f)\) - the likelihood \(p(.)\) - the true distribution \(q(.)\) - the variational distribution
\(x\) - the time points of the training data
\(z\) - the time points of the inducing/pseudo points
\(y\) - observations corresponding to time points \(x\)
\(s(.)\) - the latent state of the Markov chain
\(f(.)\) - the noise free predictions of the model
\(p(y | f)\) - the likelihood
\(p(.)\) - the true distribution
\(q(.)\) - the variational distribution
Subscript is used to denote dependence for notational convenience, for example \(fₖ === f(k)\).
With a prior generative model comprising a Gauss-Markov distribution, an emission model and an arbitrary likelihood on the emitted variables, these define:
\(p(xₖ₊₁| xₖ)\) \(fₖ = H xₖ\) \(p(yₖ | fₖ)\)
\(p(xₖ₊₁| xₖ)\)
\(fₖ = H xₖ\)
\(p(yₖ | fₖ)\)
As per a VariationalGaussianProcess (VGP) model, we have:
VariationalGaussianProcess
…where \(f\) is defined over the entire function space.
Here this reduces to the joint of the evidence lower bound (ELBO) defined over both the data \(x\) and the inducing points \(z\), which we rewrite as:
This turns the inference problem into an optimisation problem: find the optimal \(q\).
The first term is the variational expectations and have the same form as a VGP model. However, we must now use use the inducing states to predict the marginals of the variational distribution at the original data points.
The second is the KL from the prior to the approximation, but evaluated at the inducing points.
The key reference is:
@inproceedings{, title={Doubly Sparse Variational Gaussian Processes}, author={Adam, Eleftheriadis, Artemev, Durrande, Hensman}, booktitle={}, pages={}, year={}, organization={} }
Note
Since this class extends MarkovFlowSparseModel, it does not depend on input data. Input data is passed during the optimisation step as a tuple of time points and observations.
MarkovFlowSparseModel
kernel – A kernel that defines a prior over functions.
likelihood – A likelihood.
inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].
batch_shape + [num_inducing]
mean_function – The mean function for the GP. Defaults to no mean function.
num_data – The total number of observations. (relevant when feeding in external minibatches).
initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].
elbo
Calculates the evidence lower bound (ELBO) \(log p(y)\). We rewrite this as:
The first term is the ‘variational expectation’ (VE), and has the same form as per a VariationalGaussianProcess (VGP) model. However, we must now use the inducing states to predict the marginals of the variational distribution at the original data points.
The second is the KL divergence from the prior to the approximation, but evaluated at the inducing points.
input_data –
A tuple of time points and observations containing the data at which to calculate the loss for training the model:
A tensor of inputs with shape batch_shape + [num_data]
batch_shape + [num_data]
A tensor of observations with shape batch_shape + [num_data, observation_dim]
batch_shape + [num_data, observation_dim]
A scalar tensor (summed over the batch_shape dimension) representing the ELBO.
time_points
Return the time points of the sparse process which essentially are the locations of the inducing points.
A tensor with shape batch_shape + [num_inducing]. Same as inducing inputs.
kernel
Return the kernel of the GP.
likelihood
Return the likelihood of the GP.
mean_function
Return the mean function of the GP.
dist_p
Return the prior Gauss-Markov distribution.
dist_q
Return the variational distribution as a Gauss-Markov distribution.
posterior
Obtain a posterior process for inference.
For this class this is the AnalyticPosteriorProcess built from the variational distribution. This will be a locally optimal variational approximation of the posterior after optimisation.
AnalyticPosteriorProcess
loss
Return the loss, which is the negative evidence lower bound (ELBO).
A tensor of observations with shape batch_shape + [num_data, observation_dim].
predict_log_density
Compute the log density of the data at the new data points.