markovflow.models.sparse_variational_cvi
Module containing a model for Sparse CVI
SparseCVIGaussianProcess
Bases: markovflow.models.models.MarkovFlowSparseModel
markovflow.models.models.MarkovFlowSparseModel
This is an alternative parameterization to the SparseVariationalGaussianProcess
SparseVariationalGaussianProcess
Approximates a the posterior of a model with GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites on inducing states u at inducing points z.
The following notation is used:
x - the time points of the training data. z - the time points of the inducing/pseudo points. y - observations corresponding to time points x. s(.) - the continuous time latent state process u = s(z) - the discrete inducing latent state space model f(.) - the noise free predictions of the model p(y | f) - the likelihood t(u) - a site (indices will refer to the associated data point) p(.) the prior distribution q(.) the variational distribution
x - the time points of the training data.
z - the time points of the inducing/pseudo points.
y - observations corresponding to time points x.
s(.) - the continuous time latent state process
u = s(z) - the discrete inducing latent state space model
f(.) - the noise free predictions of the model
p(y | f) - the likelihood
t(u) - a site (indices will refer to the associated data point)
p(.) the prior distribution
q(.) the variational distribution
We use the state space formulation of Markovian Gaussian Processes that specifies: the conditional density of neighbouring latent states: p(sₖ₊₁| sₖ) how to read out the latent process from these states: fₖ = H sₖ
The likelihood links data to the latent process and p(yₖ | fₖ). We would like to approximate the posterior over the latent state space model of this model.
To approximate the posterior, we maximise the evidence lower bound (ELBO) (ℒ) with respect to the parameters of the variational distribution, since:
log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]
…where:
ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds
We parameterize the variational posterior through M sites tₘ(vₘ)
q(s) = p(s) ∏ₘ tₘ(vₘ)
where tₘ(vₘ) are multivariate Gaussian sites on vₘ = [uₘ, uₘ₊₁], i.e. consecutive inducing states.
The sites are parameterized in the natural form
t(v) = exp(𝜽ᵀφ(v) - A(𝜽)), where 𝜽=[θ₁, θ₂] and 𝛗(u)=[Wv, WᵀvᵀvW]
with 𝛗(v) are the sufficient statistics and 𝜽 the natural parameters and W is the projection of the conditional mean E_p(f|v)[f] = W v
Each data point indexed k contributes a fraction of the site it belongs to. If vₘ = [uₘ, uₘ₊₁], and zₘ < xₖ <= zₘ₊₁, then xₖ belongs to vₘ.
belongs
The natural gradient update of the sites are similar to that of the CVIGaussianProcess except that they apply to a different parameterization of the sites
kernel – A kernel that defines a prior over functions.
inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].
batch_shape + [num_inducing]
likelihood – A likelihood.
mean_function – The mean function for the GP. Defaults to no mean function.
learning_rate – the learning rate.
dist_q
Computes the variational posterior distribution on the vector of inducing states
update_sites
𝜽ₘ ← ρ𝜽ₘ + (1-ρ)𝐠ₘ
Here 𝐠ₘ are the sum of the gradient of the variational expectation for each data point indexed k, projected back to the site vₘ, through the conditional p(fₖ|vₘ) :param input_data: A tuple of time points and observations
loss
Obtain a Tensor representing the loss, which can be used to train the model.
Tensor
input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model.
posterior
Posterior object to predict outside of the training time points
local_objective_and_gradients
Returs the local_objective and its gradients wrt to the expectation parameters :param Fmu: means μ […, latent_dim] :param Fvar: variances σ² […, latent_dim] :param Y: observations Y […, observation_dim] :return: local objective and gradient wrt [μ, σ² + μ²]
local_objective
local loss in CVI :param Fmu: means […, latent_dim] :param Fvar: variances […, latent_dim] :param Y: observations […, observation_dim] :return: local objective […]
classic_elbo
ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]
Note: this is mostly for testing purposes and not to be used for optimization
input_data – A tuple of time points and observations
A scalar tensor representing the ELBO.
kernel
Return the kernel of the GP.
dist_p
Return the prior GaussMarkovDistribution.
GaussMarkovDistribution
likelihood
Return the likelihood of the GP.