markovflow.models.variational_cvi

Module containing a model for CVI.

Module Contents

class GaussianProcessWithSitesBase(input_data: Tuple[tf.Tensor, tf.Tensor], kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None)[source]

Bases: markovflow.models.models.MarkovFlowModel

Base class for site-based Gaussian Process approximation such as EP and CVI.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(t(f)\) - a site (indices will refer to the associated data point)

  • \(p(.)\) - the prior distribution

  • \(q(.)\) - the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies:

  • The conditional density of neighbouring latent states \(p(sₖ₊₁| sₖ)\)

  • How to read out the latent process from these states \(fₖ = H sₖ\)

The likelihood links data to the latent process and \(p(yₖ | fₖ)\). We would like to approximate the posterior over the latent state space model of this model.

We parameterize the approximate posterior using sites \(tₖ(fₖ)\):

\[q(s) = p(s) ∏ₖ tₖ(fₖ)\]

…where \(tₖ(fₖ)\) are univariate Gaussian sites parameterized in the natural form:

\[t(f) = exp(𝜽ᵀφ(f) - A(𝜽))\]

…and where \(𝜽=[θ₁,θ₂]\) and \(𝛗(f)=[f,f²]\).

Here, \(𝛗(f)\) are the sufficient statistics and \(𝜽\) are the natural parameters.

Parameters
  • input_data

    A tuple containing the observed data:

    • Time points of observations with shape batch_shape + [num_data]

    • Observations with shape batch_shape + [num_data, observation_dim]

  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

property dist_qmarkovflow.state_space_model.StateSpaceModel[source]

Construct the StateSpaceModel representation of the posterior process indexed at the time points.

property posterior_kalmanmarkovflow.kalman_filter.KalmanFilterWithSites[source]

Build the Kalman filter object from the prior state space models and the sites.

property posterior[source]

Posterior object to predict outside of the training time points

log_likelihood()tf.Tensor[source]

Calculate \(log p(y)\).

Returns

A scalar tensor representing the ELBO.

property time_pointstf.Tensor[source]

Return the time points of the observations.

Returns

A tensor with shape batch_shape + [num_data].

property conditioning_pointstf.Tensor[source]

Return the time points of the observations.

Returns

A tensor with shape batch_shape + [num_data].

property observationstf.Tensor[source]

Return the observations.

Returns

A tensor with shape batch_shape + [num_data, observation_dim].

property kernelmarkovflow.kernels.SDEKernel[source]

Return the kernel.

property likelihoodgpflow.likelihoods.Likelihood[source]

Return the likelihood.

property mean_functionmarkovflow.mean_function.MeanFunction[source]

Return the mean function.

property dist_pmarkovflow.state_space_model.StateSpaceModel[source]

Return the prior Gauss-Markov distribution.

loss()tf.Tensor[source]

Return the loss, which is the negative ELBO.

class CVIGaussianProcess(input_data: Tuple[tf.Tensor, tf.Tensor], kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, learning_rate=0.1)[source]

Bases: GaussianProcessWithSitesBase

Provides an alternative parameterization to a VariationalGaussianProcess.

This class approximates the posterior of a model with a GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(t(f)\) - a site (indices will refer to the associated data point)

  • \(p(.)\) the prior distribution

  • \(q(.)\) the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies:

  • The conditional density of neighbouring latent states \(p(sₖ₊₁| sₖ)\)

  • How to read out the latent process from these states \(fₖ = H sₖ\)

The likelihood links data to the latent process and \(p(yₖ | fₖ)\). We would like to approximate the posterior over the latent state space model of this model.

To approximate the posterior, we maximise the evidence lower bound (ELBO) \(ℒ\) with respect to the parameters of the variational distribution, since:

\[log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]\]

…where:

\[ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds\]

We parameterize the variational posterior through sites \(tₖ(fₖ)\):

\[q(s) = p(s) ∏ₖ tₖ(fₖ)\]

…where \(tₖ(fₖ)\) are univariate Gaussian sites parameterized in the natural form:

\[t(f) = exp(𝜽ᵀφ(f) - A(𝜽))\]

…and where \(𝜽=[θ₁,θ₂]\) and \(𝛗(f)=[f,f²]\).

Here, \(𝛗(f)\) are the sufficient statistics and \(𝜽\) are the natural parameters. Note that the subscript \(k\) has been omitted for simplicity.

The natural gradient update of the sites can be shown to be the gradient of the variational expectations:

\[𝐠 = ∇[𝞰][∫ log(p(y=Y|f)) q(f) df]\]

…with respect to the expectation parameters:

\[𝞰 = E[𝛗(f)] = [μ, σ² + μ²]\]

That is, \(𝜽 ← ρ𝜽 + (1-ρ)𝐠\), where \(ρ\) is the learning rate.

The key reference is:

@inproceedings{khan2017conjugate,
  title={Conjugate-Computation Variational Inference: Converting Variational Inference
         in Non-Conjugate Models to Inferences in Conjugate Models},
  author={Khan, Mohammad and Lin, Wu},
  booktitle={Artificial Intelligence and Statistics},
  pages={878--887},
  year={2017}
}
Parameters
  • input_data

    A tuple containing the observed data:

    • Time points of observations with shape batch_shape + [num_data]

    • Observations with shape batch_shape + [num_data, observation_dim]

  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • learning_rate – The learning rate of the algorithm.

local_objective(Fmu: tf.Tensor, Fvar: tf.Tensor, Y: tf.Tensor)tf.Tensor[source]

Calculate local loss in CVI.

Parameters
  • Fmu – Means with shape [..., latent_dim].

  • Fvar – Variances with shape [..., latent_dim].

  • Y – Observations with shape [..., observation_dim].

Returns

A local objective with shape [...].

local_objective_and_gradients(Fmu: tf.Tensor, Fvar: tf.Tensor)tf.Tensor[source]

Return the local objective and its gradients with regard to the expectation parameters.

Parameters
  • Fmu – Means \(μ\) with shape [..., latent_dim].

  • Fvar – Variances \(σ²\) with shape [..., latent_dim].

Returns

A local objective and gradient with regard to \([μ, σ² + μ²]\).

update_sites()None[source]

Perform one joint update of the Gaussian sites. That is:

\[𝜽 ← ρ𝜽 + (1-ρ)𝐠\]
elbo()tf.Tensor[source]

Calculate the evidence lower bound (ELBO) \(log p(y)\).

This is done by computing the marginal of the model in which the likelihood terms were replaced by the Gaussian sites.

Returns

A scalar tensor representing the ELBO.

classic_elbo()tf.Tensor[source]

Compute the ELBO the classic way. That is:

\[ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]\]

Note

This is mostly for testing purposes and should not be used for optimization.

Returns

A scalar tensor representing the ELBO.

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor[source]

Compute the log density of the data at the new data points. :param input_data: A tuple of time points and observations containing the data at which

to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

Parameters

full_output_cov – Either full output covariance (True) or marginal variances (False).

back_project_nats(nat1, nat2, C)[source]

Transforms the natural parameters 𝜽f of a Gaussian with sufficient statistics 𝛗(f)=[f,f²] into equivalent rank one natural parameters 𝜽g of a (thus degenerate) Gaussian with sufficient statistics 𝛗(g)=[g,ggᵀ] where f = Cg.

In practice [θg₁, θg₂] = [θf₁ C,θf₂ CᵀC]

Parameters
  • nat1 – natural parameters with size (num_time_points, 1)

  • nat2 – natural parameters with size (num_time_points, 1)

  • C – projection with size (num_time_points, 1, project_dim)

Returns

natural parameters with size (num_time_points, project_dim) (num_time_points, project_dim, project_dim)

gradient_transformation_mean_var_to_expectation(inputs: Tuple[tf.Tensor, tf.Tensor], grads: Tuple[tf.Tensor, tf.Tensor])Tuple[tf.Tensor, tf.Tensor][source]

Transform gradients.

This is from \(𝐠\) of a function with regard to \([μ, σ²]\) into its gradients with regard to \([μ, σ² + μ²]\).

Parameters
  • inputs – Means and variances \([μ, σ²]\).

  • grads – Gradients \(𝐠\).