markovflow.models.variational_cvi
Module containing a model for CVI.
GaussianProcessWithSitesBase
Bases: markovflow.models.models.MarkovFlowModel
markovflow.models.models.MarkovFlowModel
Base class for site-based Gaussian Process approximation such as EP and CVI.
The following notation is used:
\(x\) - the time points of the training data \(y\) - observations corresponding to time points \(x\) \(s(.)\) - the latent state of the Markov chain \(f(.)\) - the noise free predictions of the model \(p(y | f)\) - the likelihood \(t(f)\) - a site (indices will refer to the associated data point) \(p(.)\) - the prior distribution \(q(.)\) - the variational distribution
\(x\) - the time points of the training data
\(y\) - observations corresponding to time points \(x\)
\(s(.)\) - the latent state of the Markov chain
\(f(.)\) - the noise free predictions of the model
\(p(y | f)\) - the likelihood
\(t(f)\) - a site (indices will refer to the associated data point)
\(p(.)\) - the prior distribution
\(q(.)\) - the variational distribution
We use the state space formulation of Markovian Gaussian Processes that specifies:
The conditional density of neighbouring latent states \(p(sₖ₊₁| sₖ)\)
How to read out the latent process from these states \(fₖ = H sₖ\)
The likelihood links data to the latent process and \(p(yₖ | fₖ)\). We would like to approximate the posterior over the latent state space model of this model.
We parameterize the approximate posterior using sites \(tₖ(fₖ)\):
…where \(tₖ(fₖ)\) are univariate Gaussian sites parameterized in the natural form:
…and where \(𝜽=[θ₁,θ₂]\) and \(𝛗(f)=[f,f²]\).
Here, \(𝛗(f)\) are the sufficient statistics and \(𝜽\) are the natural parameters.
input_data –
A tuple containing the observed data:
Time points of observations with shape batch_shape + [num_data]
batch_shape + [num_data]
Observations with shape batch_shape + [num_data, observation_dim]
batch_shape + [num_data, observation_dim]
kernel – A kernel that defines a prior over functions.
likelihood – A likelihood with shape batch_shape + [num_inducing].
batch_shape + [num_inducing]
mean_function – The mean function for the GP. Defaults to no mean function.
dist_q
Construct the StateSpaceModel representation of the posterior process indexed at the time points.
StateSpaceModel
posterior_kalman
Build the Kalman filter object from the prior state space models and the sites.
posterior
Posterior object to predict outside of the training time points
log_likelihood
Calculate \(log p(y)\).
A scalar tensor representing the ELBO.
time_points
Return the time points of the observations.
A tensor with shape batch_shape + [num_data].
conditioning_points
observations
Return the observations.
A tensor with shape batch_shape + [num_data, observation_dim].
kernel
Return the kernel.
likelihood
Return the likelihood.
mean_function
Return the mean function.
dist_p
Return the prior Gauss-Markov distribution.
loss
Return the loss, which is the negative ELBO.
CVIGaussianProcess
Bases: GaussianProcessWithSitesBase
Provides an alternative parameterization to a VariationalGaussianProcess.
VariationalGaussianProcess
This class approximates the posterior of a model with a GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites.
\(x\) - the time points of the training data \(y\) - observations corresponding to time points \(x\) \(s(.)\) - the latent state of the Markov chain \(f(.)\) - the noise free predictions of the model \(p(y | f)\) - the likelihood \(t(f)\) - a site (indices will refer to the associated data point) \(p(.)\) the prior distribution \(q(.)\) the variational distribution
\(p(.)\) the prior distribution
\(q(.)\) the variational distribution
To approximate the posterior, we maximise the evidence lower bound (ELBO) \(ℒ\) with respect to the parameters of the variational distribution, since:
…where:
We parameterize the variational posterior through sites \(tₖ(fₖ)\):
Here, \(𝛗(f)\) are the sufficient statistics and \(𝜽\) are the natural parameters. Note that the subscript \(k\) has been omitted for simplicity.
The natural gradient update of the sites can be shown to be the gradient of the variational expectations:
…with respect to the expectation parameters:
That is, \(𝜽 ← ρ𝜽 + (1-ρ)𝐠\), where \(ρ\) is the learning rate.
The key reference is:
@inproceedings{khan2017conjugate, title={Conjugate-Computation Variational Inference: Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models}, author={Khan, Mohammad and Lin, Wu}, booktitle={Artificial Intelligence and Statistics}, pages={878--887}, year={2017} }
learning_rate – The learning rate of the algorithm.
local_objective
Calculate local loss in CVI.
Fmu – Means with shape [..., latent_dim].
[..., latent_dim]
Fvar – Variances with shape [..., latent_dim].
Y – Observations with shape [..., observation_dim].
[..., observation_dim]
A local objective with shape [...].
[...]
local_objective_and_gradients
Return the local objective and its gradients with regard to the expectation parameters.
Fmu – Means \(μ\) with shape [..., latent_dim].
Fvar – Variances \(σ²\) with shape [..., latent_dim].
A local objective and gradient with regard to \([μ, σ² + μ²]\).
update_sites
Perform one joint update of the Gaussian sites. That is:
elbo
Calculate the evidence lower bound (ELBO) \(log p(y)\).
This is done by computing the marginal of the model in which the likelihood terms were replaced by the Gaussian sites.
classic_elbo
Compute the ELBO the classic way. That is:
Note
This is mostly for testing purposes and should not be used for optimization.
predict_log_density
Compute the log density of the data at the new data points. :param input_data: A tuple of time points and observations containing the data at which
to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].
full_output_cov – Either full output covariance (True) or marginal variances (False).
True
False
back_project_nats
Transforms the natural parameters 𝜽f of a Gaussian with sufficient statistics 𝛗(f)=[f,f²] into equivalent rank one natural parameters 𝜽g of a (thus degenerate) Gaussian with sufficient statistics 𝛗(g)=[g,ggᵀ] where f = Cg.
In practice [θg₁, θg₂] = [θf₁ C,θf₂ CᵀC]
nat1 – natural parameters with size (num_time_points, 1)
nat2 – natural parameters with size (num_time_points, 1)
C – projection with size (num_time_points, 1, project_dim)
natural parameters with size (num_time_points, project_dim) (num_time_points, project_dim, project_dim)
gradient_transformation_mean_var_to_expectation
Transform gradients.
This is from \(𝐠\) of a function with regard to \([μ, σ²]\) into its gradients with regard to \([μ, σ² + μ²]\).
inputs – Means and variances \([μ, σ²]\).
grads – Gradients \(𝐠\).