markovflow.models

Package containing ready-to-use GP models.

Package Contents

class GaussianProcessRegression(input_data: Tuple[tf.Tensor, tf.Tensor], kernel: markovflow.kernels.SDEKernel, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, chol_obs_covariance: Optional[gpflow.base.TensorType] = None)[source]

Bases: markovflow.models.models.MarkovFlowModel

Performs GP regression.

The key reference is Chapter 2 of:

Gaussian Processes for Machine Learning
Carl Edward Rasmussen and Christopher K. I. Williams
The MIT Press, 2006. ISBN 0-262-18253-X.

This class uses the kernel and the time points to create a state space model. GP regression is then a Kalman filter on that state space model using the observations.

Parameters
  • kernel – A kernel defining a prior over functions.

  • input_data – A tuple of (time_points, observations) containing the observed data: time points of observations, with shape batch_shape + [num_data], observations with shape batch_shape + [num_data, observation_dim].

  • chol_obs_covariance – A TensorType containing the Cholesky factor of the observation noise covariance, with shape [observation_dim, observation_dim]. a default None value will assume independent likelihood variance of 1.0

  • mean_function – The mean function for the GP. Defaults to no mean function.

property time_pointstf.Tensor

Return the time points of observations.

Returns

A tensor with shape batch_shape + [num_data].

property observationstf.Tensor

Return the observations.

Returns

A tensor with shape batch_shape + [num_data, observation_dim].

property kernelmarkovflow.kernels.SDEKernel

Return the kernel of the GP.

property mean_functionmarkovflow.mean_function.MeanFunction

Return the mean function of the GP.

loss()tf.Tensor

Return the loss, which is the negative log likelihood.

property posteriormarkovflow.posterior.PosteriorProcess

Obtain a posterior process for inference.

For this class, this is the AnalyticPosteriorProcess built from the Kalman filter.

log_likelihood()tf.Tensor

Calculate the log likelihood of the observations given the kernel parameters.

In other words, \(log p(y_{1...T} | ϑ)\) for some parameters \(ϑ\).

Returns

A scalar tensor (summed over the batch shape and the whole trajectory).

class ImportanceWeightedVI(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: gpflow.likelihoods.Likelihood, num_importance_samples: int, initial_distribution: Optional[markovflow.state_space_model.StateSpaceModel] = None, mean_function: Optional[markovflow.mean_function.MeanFunction] = None)[source]

Bases: markovflow.models.sparse_variational.SparseVariationalGaussianProcess

Performs importance-weighted variational inference (IWVI).

The key reference is:

@inproceedings{domke2018importance,
  title={Importance weighting and variational inference},
  author={Domke, Justin and Sheldon, Daniel R},
  booktitle={Advances in neural information processing systems},
  pages={4470--4479},
  year={2018}
}

The idea is based on the observation that an estimator of the evidence lower bound (ELBO) can be obtained from an importance weight \(w\):

\[L₁ = log w(x₁), x₁ ~ q(x)\]

…where \(x\) is the latent variable of the model (a GP, or set of GPs in our case) and the function \(w\) is:

\[w(x) = p(y | x) p(x) / q(x)\]

It follows that:

\[ELBO = 𝔼ₓ₁[ L₁ ]\]

…and:

\[log p(y) = log 𝔼ₓ₁[ w(x₁) ]\]

It turns out that there are a series of lower bounds given by taking multiple importance samples:

\[Lₙ = log (1/n) Σᵢⁿ w(xᵢ), xᵢ ~ q(x)\]

And we have the relation:

\[log p(y) >= 𝔼[Lₙ] >= 𝔼[Lₙ₋₁] >= ... >= 𝔼[L₁] = ELBO\]

This means that we can improve tightness of the ELBO to the log marginal likelihood by increasing \(n\), which we refer to in this class as num_importance_samples. The trade-offs are:

  • The objective function is now always stochastic, even for cases where the ELBO of the parent class is non-stochastic

  • We have to do more computations (evaluate the weights \(n\) times)

Parameters
  • kernel – A kernel that defines a prior over functions.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • likelihood – A likelihood.

  • num_importance_samples – The number of samples for the importance-weighted estimator.

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

elbo(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Compute the importance-weighted ELBO using K samples. The procedure is:

for k=1...K:
    uₖ ~ q(u)
    sₖ ~ p(s | u)
    wₖ = p(y | sₖ)p(uₖ) / q(uₖ)

ELBO = log (1/K) Σₖwₖ

Everything is computed in log-space for stability. Note that gradients of this ELBO may have high variance with regard to the variational parameters; see the DREGS gradient estimator method.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.

dregs_objective(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Compute a scalar tensor that, when differentiated using tf.gradients, produces the DREGS variance controlled gradient.

See “Doubly Reparameterized Gradient Estimators For Monte Carlo Objectives” for a derivation.

We recommend using these gradients for training variational parameters and gradients of the importance-weighted ELBO for training hyperparameters.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.

class MarkovFlowModel(name=None)[source]

Bases: tf.Module, abc.ABC

Abstract class representing Markovflow models that depend on input data.

All Markovflow models are TensorFlow Modules, so it is possible to obtain trainable variables via the trainable_variables attribute. You can combine this with the loss() method to train the model. For example:

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.01)
for i in range(iterations):
    model.optimization_step(optimizer)

Call the predict_f() method to predict marginal function values at future time points. For example:

mean, variance = model.predict_f(validation_data_tensor)

Note

Markovflow models that extend this class must implement the loss() method and posterior attribute.

abstract loss()tf.Tensor

Obtain the loss, which you can use to train the model. It should always return a scalar.

Raises

NotImplementedError – Must be implemented in derived classes.

property posteriormarkovflow.posterior.PosteriorProcess

Return a posterior process from the model, which can be used for inference.

Raises

NotImplementedError – Must be implemented in derived classes.

predict_state(new_time_points: tf.Tensor)Tuple[tf.Tensor, tf.Tensor]

Predict state at new_time_points. Note these time points should be sorted.

Parameters

new_time_points – Time points to generate observations for, with shape batch_shape + [num_new_time_points,].

Returns

Predicted mean and covariance for the new time points, with respective shapes batch_shape + [num_new_time_points, state_dim] batch_shape + [num_new_time_points, state_dim, state_dim].

predict_f(new_time_points: tf.Tensor, full_output_cov: bool = False)Tuple[tf.Tensor, tf.Tensor]

Predict marginal function values at new_time_points. Note these time points should be sorted.

Parameters
  • new_time_points – Time points to generate observations for, with shape batch_shape + [num_new_time_points].

  • full_output_cov – Either full output covariance (True) or marginal variances (False).

Returns

Predicted mean and covariance for the new time points, with respective shapes batch_shape + [num_new_time_points, output_dim] and either batch_shape + [num_new_time_points, output_dim, output_dim] or batch_shape + [num_new_time_points, output_dim].

class MarkovFlowSparseModel(name=None)[source]

Bases: tf.Module, abc.ABC

Abstract class representing Markovflow models that do not need to store the training data (\(X, Y\)) in the model to approximate the posterior predictions \(p(f*|X, Y, x*)\).

This currently applies only to sparse variational models.

The optimization_step method should typically be used to train the model. For example:

input_data = (tf.constant(time_points), tf.constant(observations))
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.01)
for i in range(iterations):
    model.optimization_step(input_data, optimizer)

Call the predict_f() method to predict marginal function values at future time points. For example:

mean, variance = model.predict_f(validation_data_tensor)

Note

Markovflow models that extend this class must implement the loss() method and posterior attribute.

abstract loss(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Obtain the loss, which can be used to train the model.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Raises

NotImplementedError – Must be implemented in derived classes.

property posteriormarkovflow.posterior.PosteriorProcess

Obtain a posterior process from the model, which can be used for inference.

Raises

NotImplementedError – Must be implemented in derived classes.

predict_state(new_time_points: tf.Tensor)Tuple[tf.Tensor, tf.Tensor]

Predict state at new_time_points. Note these time points should be sorted.

Parameters

new_time_points – Time points to generate observations for, with shape batch_shape + [num_new_time_points,].

Returns

Predicted mean and covariance for the new time points, with respective shapes batch_shape + [num_new_time_points, state_dim] batch_shape + [num_new_time_points, state_dim, state_dim].

predict_f(new_time_points: tf.Tensor, full_output_cov: bool = False)Tuple[tf.Tensor, tf.Tensor]

Predict marginal function values at new_time_points. Note these time points should be sorted.

Parameters
  • new_time_points – Time points to generate observations for, with shape batch_shape + [num_new_time_points].

  • full_output_cov – Either full output covariance (True) or marginal variances (FalseF).

Returns

Predicted mean and covariance for the new time points, with respective shapes batch_shape + [num_new_time_points, output_dim] and either batch_shape + [num_new_time_points, output_dim, output_dim] or batch_shape + [num_new_time_points, output_dim].

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor

Compute the log density of the data. That is:

\[log ∫ p(yᵢ=Yᵢ|Fᵢ)q(Fᵢ) dFᵢ\]
Parameters
  • input_data

    A tuple of time points and observations containing the data at which to calculate the loss for training the model:

    • A tensor of inputs with shape batch_shape + [num_data]

    • A tensor of observations with shape batch_shape + [num_data, observation_dim]

  • full_output_cov – Either full output covariance (True) or marginal variances (False).

Returns

Predicted log density at input time points, with shape batch_shape + [num_data].

class PowerExpectationPropagation(kernel: markovflow.kernels.SDEKernel, input_data: Tuple[tf.Tensor, tf.Tensor], likelihood: markovflow.likelihoods.PEPScalarLikelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, learning_rate=1.0, alpha=1.0)[source]

Bases: markovflow.models.variational_cvi.GaussianProcessWithSitesBase

This is an approximate inference called Power Expectation Propagation.

Approximates a the posterior of a model with GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites.

The following notation is used:

  • x - the time points of the training data.

  • y - observations corresponding to time points x.

  • s(.) - the latent state of the Markov chain

  • f(.) - the noise free predictions of the model

  • p(y | f) - the likelihood

  • t(f) - a site (indices will refer to the associated data point)

  • p(.) the prior distribution

  • q(.) the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies: the conditional density of neighbouring latent states: p(xₖ₊₁| xₖ) how to read out the latent process from these states: fₖ = H xₖ

The likelihood links data to the latent process and p(yₖ | fₖ). We would like to approximate the posterior over the latent state space model of this model.

We parameterize the joint posterior using sites tₖ(fₖ)

p(x, y) = p(x) ∏ₖ tₖ(fₖ)

where tₖ(fₖ) are univariate Gaussian sites parameterized in the natural form

t(f) = exp(𝞰ᵀφ(f) - A(𝞰)), where 𝞰=[η₁,η₂] and 𝛗(f)=[f,f²]

(note: the subscript k has been omitted for simplicity)

The site update of the sites are given by the classic EP update rules as described in:

@techreport{seeger2005expectation,

title={Expectation propagation for exponential families}, author={Seeger, Matthias}, year={2005}

}

Parameters
  • kernel – A kernel that defines a prior over functions.

  • input_data – A tuple of (time_points, observations) containing the observed data: time points of observations, with shape batch_shape + [num_data], observations with shape batch_shape + [num_data, observation_dim].

  • likelihood – A likelihood. with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • learning_rate – the learning rate of the algorithm

  • alpha – the power as in Power Expectation propagation

local_objective(Fmu, Fvar, Y)

Local objective of the PEP algorithm : log E_q(f) p(y|f)ᵃ

local_objective_gradients(Fmu, Fvar)

Gradients of the local objective of the PEP algorithm wrt to the predictive mean

mask_indices(exclude_indices)

Binary mask (cast to float), 0 for the excluded indices, 1 for the rest

compute_cavity_from_marginals(marginals)

Compute cavity from marginals :param marginals: list of tensors

compute_cavity()

The cavity distributions for all data points. This corresponds to the marginal distribution qᐠⁿ(fₙ) of qᐠⁿ(f) = q(f)/tₙ(fₙ)ᵃ

compute_log_norm()

Compute log normalizer

update_sites(site_indices=None)

Compute the site updates and perform one update step :param site_indices: list of indices to be updated

elbo()tf.Tensor

Computes the marginal log marginal likelihood of the approximate joint p(s, y)

energy()

PEP Energy

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor

Compute the log density of the data at the new data points.

Parameters
  • input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

  • full_output_cov – Either full output covariance (True) or marginal variances (False).

class SparsePowerExpectationPropagation(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: markovflow.likelihoods.PEPScalarLikelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, learning_rate=1.0, alpha=1.0)[source]

Bases: markovflow.models.models.MarkovFlowSparseModel

This is the Sparse Power Expectation Propagation Algorithm

Approximates a the posterior of a model with GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites on inducing states u at inducing points z.

The following notation is used:

  • x - the time points of the training data.

  • z - the time points of the inducing/pseudo points.

  • y - observations corresponding to time points x.

  • s(.) - the continuous time latent state process

  • u = s(z) - the discrete inducing latent state space model

  • f(.) - the noise free predictions of the model

  • p(y | f) - the likelihood

  • t(u) - a site (indices will refer to the associated data point)

  • p(.) the prior distribution

  • q(.) the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies: the conditional density of neighbouring latent states: p(sₖ₊₁| sₖ) how to read out the latent process from these states: fₖ = H sₖ

The likelihood links data to the latent process and p(yₖ | fₖ). We would like to approximate the posterior over the latent state space model of this model.

To approximate the posterior, we maximise the evidence lower bound (ELBO) (ℒ) with respect to the parameters of the variational distribution, since:

log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]

…where:

ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds

We parameterize the variational posterior through M sites tₘ(vₘ)

q(s) = p(s) ∏ₘ tₘ(vₘ)

where tₘ(vₘ) are multivariate Gaussian sites on vₘ = [uₘ, uₘ₊₁], i.e. consecutive inducing states.

The sites are parameterized in the natural form

t(v) = exp(𝜽ᵀφ(v) - A(𝜽)), where 𝜽=[θ₁, θ₂] and 𝛗(u)=[v, vᵀv]

with 𝛗(v) are the sufficient statistics and 𝜽 the natural parameters

Parameters
  • kernel – A kernel that defines a prior over functions.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • likelihood – A likelihood.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • learning_rate – the learning rate

  • alpha – power as in Power Expectation Propagation

posterior()

Posterior Process

mask_indices(exclude_indices)

Binary mask to exclude data indices :param exclude_indices:

back_project_nats(nat1, nat2, time_points)

back project natural gradient associated to time points to their associated inducing sites.

local_objective(Fmu, Fvar, Y)

Local objective of the PEP algorithm : log E_q(f) p(y|f)ᵃ

local_objective_gradients(fx_mus, fx_covs, observations, alpha=1.0)

Gradients of the local objective of the PEP algorithm wrt to the predictive mean

fraction_sites(time_points)

for all segment indexed m of consecutive inducing points [z_m, z_m+1[, this counts the time points t falling in that segment: c(m) = #{t, z_m <= t < z_m+1} and returns 1/c(m) or 0 when c(m)=0

Parameters

time_points – tensor of shape batch_shape + [num_data]

Returns

tensor of shape batch_shape + [num_data]

compute_posterior_ssm(nat1, nat2)

Computes the variational posterior distribution on the vector of inducing states

property dist_q

Computes the variational posterior distribution on the vector of inducing states

compute_marginals()

Compute pairwise marginals

remove_cavity_from_marginals(time_points, marginals)

Remove cavity from marginals :param time_points: :param marginals: pairwise mean and covariance tensors

compute_cavity_state(time_points)

The cavity distributions for data points at input time_points. This corresponds to the marginal distribution qᐠⁿ(fₙ) of qᐠⁿ(s) = q(s)/tₘ(vₘ)ᵝᵃ, where β = a * (1 / #time points touching site tₘ)

compute_cavity(time_points)

Cavity on f :param time_points: time points

compute_new_sites(input_data)

Compute the site updates and perform one update step. :param input_data: A tuple of time points and observations containing the data from which

to calculate the the updates: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

compute_log_norm(input_data)

Compute the site updates and perform one update step. :param input_data: A tuple of time points and observations containing the data from which

to calculate the the updates: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

compute_num_data_per_interval(time_points)

compute fraction of site per data point

compute_fraction(time_points)

compute fraction of site per data point

update_sites(input_data)

apply updates

energy(input_data)

The PEP energy : ∫ ds p(s) 𝚷_m t_m(v_m) :param input_data: input data

loss(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Return the loss, which is the negative evidence lower bound (ELBO).

Parameters

input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model.

property dist_pmarkovflow.gauss_markov.GaussMarkovDistribution

Return the prior GaussMarkovDistribution.

property kernelmarkovflow.kernels.SDEKernel

Return the kernel of the GP.

classic_elbo(input_data: Tuple[tf.Tensor, tf.Tensor])
Computes the ELBO the classic way:

ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]

Note: this is mostly for testing purposes and not to be used for optimization

Parameters

input_data – A tuple of time points and observations

Returns

A scalar tensor representing the ELBO.

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor

Compute the log density of the data at the new data points.

Parameters
  • input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

  • full_output_cov – Either full output covariance (True) or marginal variances (False).

class SparseVariationalGaussianProcess(kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, inducing_points: tf.Tensor, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, num_data: Optional[int] = None, initial_distribution: Optional[markovflow.gauss_markov.GaussMarkovDistribution] = None)[source]

Bases: markovflow.models.models.MarkovFlowSparseModel

Approximate a GaussMarkovDistribution with a general likelihood using a Gaussian posterior. Additionally uses a number of pseudo, or inducing, points to represent the distribution over a typically larger number of data points.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(z\) - the time points of the inducing/pseudo points

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(p(.)\) - the true distribution

  • \(q(.)\) - the variational distribution

Subscript is used to denote dependence for notational convenience, for example \(fₖ === f(k)\).

With a prior generative model comprising a Gauss-Markov distribution, an emission model and an arbitrary likelihood on the emitted variables, these define:

  • \(p(xₖ₊₁| xₖ)\)

  • \(fₖ = H xₖ\)

  • \(p(yₖ | fₖ)\)

As per a VariationalGaussianProcess (VGP) model, we have:

\[ \begin{align}\begin{aligned}&log p(y) >= ℒ(q)\\&ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]\end{aligned}\end{align} \]

…where \(f\) is defined over the entire function space.

Here this reduces to the joint of the evidence lower bound (ELBO) defined over both the data \(x\) and the inducing points \(z\), which we rewrite as:

\[ℒ(q(x, z)) = Σᵢ ∫ log(p(yᵢ | fₓ)) q(fₓ) df - KL[q(f(z)) ‖ p(f(z))]\]

This turns the inference problem into an optimisation problem: find the optimal \(q\).

The first term is the variational expectations and have the same form as a VGP model. However, we must now use use the inducing states to predict the marginals of the variational distribution at the original data points.

The second is the KL from the prior to the approximation, but evaluated at the inducing points.

The key reference is:

@inproceedings{,
    title={Doubly Sparse Variational Gaussian Processes},
    author={Adam, Eleftheriadis, Artemev, Durrande, Hensman},
    booktitle={},
    pages={},
    year={},
    organization={}
}

Note

Since this class extends MarkovFlowSparseModel, it does not depend on input data. Input data is passed during the optimisation step as a tuple of time points and observations.

Parameters
  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • num_data – The total number of observations. (relevant when feeding in external minibatches).

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

elbo(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Calculates the evidence lower bound (ELBO) \(log p(y)\). We rewrite this as:

\[ℒ(q(x, z)) = Σᵢ ∫ log(p(yᵢ | fₓ)) q(fₓ) df - KL[q(s(z)) ‖ p(s(z))]\]

The first term is the ‘variational expectation’ (VE), and has the same form as per a VariationalGaussianProcess (VGP) model. However, we must now use the inducing states to predict the marginals of the variational distribution at the original data points.

The second is the KL divergence from the prior to the approximation, but evaluated at the inducing points.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor (summed over the batch_shape dimension) representing the ELBO.

property time_pointstf.Tensor

Return the time points of the sparse process which essentially are the locations of the inducing points.

Returns

A tensor with shape batch_shape + [num_inducing]. Same as inducing inputs.

property kernelmarkovflow.kernels.SDEKernel

Return the kernel of the GP.

property likelihoodgpflow.likelihoods.Likelihood

Return the likelihood of the GP.

property mean_functionmarkovflow.mean_function.MeanFunction

Return the mean function of the GP.

property dist_pmarkovflow.gauss_markov.GaussMarkovDistribution

Return the prior Gauss-Markov distribution.

property dist_qmarkovflow.gauss_markov.GaussMarkovDistribution

Return the variational distribution as a Gauss-Markov distribution.

property posteriormarkovflow.posterior.PosteriorProcess

Obtain a posterior process for inference.

For this class this is the AnalyticPosteriorProcess built from the variational distribution. This will be a locally optimal variational approximation of the posterior after optimisation.

loss(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Return the loss, which is the negative evidence lower bound (ELBO).

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim].

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor

Compute the log density of the data at the new data points.

class SparseCVIGaussianProcess(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, learning_rate=0.1)[source]

Bases: markovflow.models.models.MarkovFlowSparseModel

This is an alternative parameterization to the SparseVariationalGaussianProcess

Approximates a the posterior of a model with GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites on inducing states u at inducing points z.

The following notation is used:

  • x - the time points of the training data.

  • z - the time points of the inducing/pseudo points.

  • y - observations corresponding to time points x.

  • s(.) - the continuous time latent state process

  • u = s(z) - the discrete inducing latent state space model

  • f(.) - the noise free predictions of the model

  • p(y | f) - the likelihood

  • t(u) - a site (indices will refer to the associated data point)

  • p(.) the prior distribution

  • q(.) the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies: the conditional density of neighbouring latent states: p(sₖ₊₁| sₖ) how to read out the latent process from these states: fₖ = H sₖ

The likelihood links data to the latent process and p(yₖ | fₖ). We would like to approximate the posterior over the latent state space model of this model.

To approximate the posterior, we maximise the evidence lower bound (ELBO) (ℒ) with respect to the parameters of the variational distribution, since:

log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]

…where:

ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds

We parameterize the variational posterior through M sites tₘ(vₘ)

q(s) = p(s) ∏ₘ tₘ(vₘ)

where tₘ(vₘ) are multivariate Gaussian sites on vₘ = [uₘ, uₘ₊₁], i.e. consecutive inducing states.

The sites are parameterized in the natural form

t(v) = exp(𝜽ᵀφ(v) - A(𝜽)), where 𝜽=[θ₁, θ₂] and 𝛗(u)=[Wv, WᵀvᵀvW]

with 𝛗(v) are the sufficient statistics and 𝜽 the natural parameters and W is the projection of the conditional mean E_p(f|v)[f] = W v

Each data point indexed k contributes a fraction of the site it belongs to. If vₘ = [uₘ, uₘ₊₁], and zₘ < xₖ <= zₘ₊₁, then xₖ belongs to vₘ.

The natural gradient update of the sites are similar to that of the CVIGaussianProcess except that they apply to a different parameterization of the sites

Parameters
  • kernel – A kernel that defines a prior over functions.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • likelihood – A likelihood.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • learning_rate – the learning rate.

property dist_q

Computes the variational posterior distribution on the vector of inducing states

update_sites(input_data: Tuple[tf.Tensor, tf.Tensor])
Perform one joint update of the Gaussian sites

𝜽ₘ ← ρ𝜽ₘ + (1-ρ)𝐠ₘ

Here 𝐠ₘ are the sum of the gradient of the variational expectation for each data point indexed k, projected back to the site vₘ, through the conditional p(fₖ|vₘ) :param input_data: A tuple of time points and observations

loss(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor

Obtain a Tensor representing the loss, which can be used to train the model.

Parameters

input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model.

property posterior

Posterior object to predict outside of the training time points

local_objective_and_gradients(Fmu, Fvar, Y)

Returs the local_objective and its gradients wrt to the expectation parameters :param Fmu: means μ […, latent_dim] :param Fvar: variances σ² […, latent_dim] :param Y: observations Y […, observation_dim] :return: local objective and gradient wrt [μ, σ² + μ²]

local_objective(Fmu, Fvar, Y)

local loss in CVI :param Fmu: means […, latent_dim] :param Fvar: variances […, latent_dim] :param Y: observations […, observation_dim] :return: local objective […]

classic_elbo(input_data: Tuple[tf.Tensor, tf.Tensor])
Computes the ELBO the classic way:

ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]

Note: this is mostly for testing purposes and not to be used for optimization

Parameters

input_data – A tuple of time points and observations

Returns

A scalar tensor representing the ELBO.

property kernelmarkovflow.kernels.SDEKernel

Return the kernel of the GP.

property dist_pmarkovflow.state_space_model.StateSpaceModel

Return the prior GaussMarkovDistribution.

property likelihoodgpflow.likelihoods.Likelihood

Return the likelihood of the GP.

class SpatioTemporalSparseVariational(inducing_space, inducing_time, kernel_space: gpflow.kernels.Kernel, kernel_time: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, num_data=None)[source]

Bases: SpatioTemporalBase

Model for Variational Spatio-temporal GP regression using a factor kernel k_space_time((s,t),(s’,t’)) = k_time(t,t’) * k_space(s,s’)

where k_time is a Markovian kernel.

The following notation is used: * X=(x,t) - the space-time points of the training data. * zₛ - the space inducing/pseudo points. * zₜ - the time inducing/pseudo points. * y - observations corresponding to points X. * f(.,.) the spatio-temporal process * x(.,.) the SSM formulation of the spatio-temporal process * u(.) = x(zₛ,.) - the spatio-temporal SSM marginalized at zₛ * p(y | f) - the likelihood * p(.) the prior distribution * q(.) the variational distribution

This can be seen as the temporal extension of gpflow.SVGP, where instead of fixed inducing variables u, they are now time dependent u(t) and follow a Markov chain.

for a fixed set of spatial inducing inputs zₛ p(x(zₛ, .)) is a continuous time process of state dimension Mₛd for a fixed time slice t, p(x(.,t)) ~ GP(0, kₛ)

The following conditional independence holds: p(x(s,t) | x(zₛ, .)) = p(x(s,t) | s(zₛ, t)), i.e., prediction at a new point at time t given x(zₛ, .) only depends on s(zₛ, t)

This builds a spatially sparse process as q(x(.,.)) = q(x(zₛ, .)) p(x(.,.) |x(zₛ, .)), where the multi-output temporal process q(x(zₛ, .)) is also sparse q(x(zₛ, .)) = q(x(zₛ, zₜ)) p(x(zₛ,.) |x(zₛ, zₜ))

the marginal q(x(zₛ, zₜ)) is a multivariate Gaussian distribution parameterized as a state space model.

Parameters
  • inducing_space – inducing space points [Ms, D]

  • inducing_time – inducing time points [Mt,]

  • kernel_space – Gpflow space kernel

  • kernel_time – Markovflow time kernel

  • likelihood – a likelihood object

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • num_data – number of observations

property dist_qmarkovflow.state_space_model.StateSpaceModel

Posterior state space model on inducing states

property dist_pmarkovflow.state_space_model.StateSpaceModel

Prior state space model on inducing states

property posteriormarkovflow.posterior.PosteriorProcess

Posterior process

class SpatioTemporalSparseCVI(inducing_space, inducing_time, kernel_space: gpflow.kernels.Kernel, kernel_time: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, num_data=None, learning_rate=0.1)[source]

Bases: SpatioTemporalBase

Model for Spatio-temporal GP regression using a factor kernel k_space_time((s,t),(s’,t’)) = k_time(t,t’) * k_space(s,s’)

where k_time is a Markovian kernel.

The following notation is used: * X=(x,t) - the space-time points of the training data. * zₛ - the space inducing/pseudo points. * zₜ - the time inducing/pseudo points. * y - observations corresponding to points X. * f(.,.) the spatio-temporal process * x(.,.) the SSM formulation of the spatio-temporal process * u(.) = x(zₛ,.) - the spatio-temporal SSM marginalized at zₛ * p(y | f) - the likelihood * p(.) the prior distribution * q(.) the variational distribution

This can be seen as the spatial extension of markovflow’s SparseCVIGaussianProcess for temporal (only) Gaussian Processes. The inducing variables u(x,t) are now space and time dependent.

for a fixed set of space points zₛ p(x(zₛ, .)) is a continuous time process of state dimension Mₛd for a fixed time slice t, p(x(.,t)) ~ GP(0, kₛ)

The following conditional independence holds: p(x(s,t) | x(zₛ, .)) = p(x(s,t) | s(zₛ, t)), i.e., prediction at a new point at time t given x(zₛ, .) only depends on s(zₛ, t)

This builds a spatially sparse process as q(x(.,.)) = q(x(zₛ, .)) p(x(.,.) |x(zₛ, .)), where the multi-output temporal process q(x(zₛ, .)) is also sparse q(x(zₛ, .)) = q(x(zₛ, zₜ)) p(x(zₛ,.) |x(zₛ, zₜ))

the marginal q(x(zₛ, zₜ)) is parameterized as the product q(x(zₛ, zₜ)) = p(x(zₛ, zₜ)) t(x(zₛ, zₜ)) where p(x(zₛ, zₜ)) is a state space model and t(x(zₛ, zₜ)) are sites.

Parameters
  • inducing_space – inducing space points [Ms, D]

  • inducing_time – inducing time points [Mt,]

  • kernel_space – Gpflow space kernel

  • kernel_time – Markovflow time kernel

  • likelihood – a likelihood object

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • num_data – The total number of observations. (relevant when feeding in external minibatches).

  • learning_rate – the learning rate.

property posteriormarkovflow.posterior.PosteriorProcess

Posterior object to predict outside of the training time points

property dist_qmarkovflow.state_space_model.StateSpaceModel

Computes the variational posterior distribution on the vector of inducing states

property dist_pmarkovflow.state_space_model.StateSpaceModel

Computes the prior distribution on the vector of inducing states

projection_inducing_states_to_observations(input_data: tf.Tensor)tf.Tensor

Compute the projection matrix from of the conditional mean of f(x,t) | s(t) :param input_data: Time point and associated spatial dimension to generate observations for,

with shape batch_shape + [space_dim + 1, num_time_points].

Returns

The projection matrix with shape [num_time_points, obs_dim, num_inducing_time x state_dim ]

update_sites(input_data: Tuple[tf.Tensor, tf.Tensor])None
Perform one joint update of the Gaussian sites

𝜽ₘ ← ρ𝜽ₘ + (1-ρ)𝐠ₘ

Here 𝐠ₘ are the sum of the gradient of the variational expectation for each data point indexed k, projected back to the site vₘ = [uₘ, uₘ₊₁], through the conditional p(fₖ|vₘ) :param input_data: A tuple of time points and observations

local_objective_and_gradients(Fmu: tf.Tensor, Fvar: tf.Tensor, Y: tf.Tensor)tf.Tensor

Returs the local_objective and its gradients wrt to the expectation parameters :param Fmu: means μ […, latent_dim] :param Fvar: variances σ² […, latent_dim] :param Y: observations Y […, observation_dim] :return: local objective and gradient wrt [μ, σ² + μ²]

local_objective(Fmu: tf.Tensor, Fvar: tf.Tensor, Y: tf.Tensor)tf.Tensor

local loss in CVI :param Fmu: means […, latent_dim] :param Fvar: variances […, latent_dim] :param Y: observations […, observation_dim] :return: local objective […]

class VariationalGaussianProcess(input_data: Tuple[tf.Tensor, tf.Tensor], kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, initial_distribution: Optional[markovflow.gauss_markov.GaussMarkovDistribution] = None)[source]

Bases: markovflow.models.models.MarkovFlowModel

Approximates a GaussMarkovDistribution with a general likelihood using a Gaussian posterior.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(p(.)\) - the true distribution

  • \(q(.)\) - the variational distribution

Subscript is used to denote dependence for notational convenience, for example \(fₖ === f(k)\).

With a prior generative model comprising a Gauss-Markov distribution, an emission model and an arbitrary likelihood on the emitted variables, these define:

  • \(p(xₖ₊₁| xₖ)\)

  • \(fₖ = H xₖ\)

  • \(p(yₖ | fₖ)\)

We would like to approximate the posterior of this generative model with a parametric model \(q\), comprising of the same distribution as the prior.

To approximate the posterior, we maximise the evidence lower bound (ELBO) \(ℒ\) with respect to the parameters of the variational distribution, since:

\[log p(y) = ℒ(q) + KL[q ‖ p(f | y)]\]

…where:

\[ℒ(q) = ∫ log(p(f, y) / q(f)) q(f) df\]

Since the last term is non-negative, the ELBO provides a lower bound to the log-likelihood of the model. This bound is exact when \(KL[q ‖ p(f | y)] = 0\); that is, our approximation is sufficiently flexible to capture the true posterior.

This turns the inference into an optimisation problem: find the optional \(q\).

To calculate the ELBO, we rewrite it as:

\[ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]\]

The first term is the ‘variational expectation’ of the model likelihood; the second is the KL from the prior to the approximation.

Parameters
  • input_data – A tuple of (time_points, observations) containing the observed data: time points of observations, with shape batch_shape + [num_data], observations with shape batch_shape + [num_data, observation_dim].

  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

elbo()tf.Tensor

Calculate the evidence lower bound (ELBO) \(log p(y)\). We rewrite the ELBO as:

\[ℒ(q(x)) = Σᵢ ∫ log(p(yᵢ | fₓ)) q(fₓ) df - KL[q(sₓ) ‖ p(sₓ)]\]

The first term is the ‘variational expectation’ (VE); the second is the KL divergence from the prior to the approximation.

Returns

A scalar tensor (summed over the batch_shape dimension) representing the ELBO.

property time_pointstf.Tensor

Return the time points of our observations.

Returns

A tensor with shape batch_shape + [num_data].

property observationstf.Tensor

Return the observations.

Returns

A tensor with shape batch_shape + [num_data, observation_dim].

property kernelmarkovflow.kernels.SDEKernel

Return the kernel of the GP.

property likelihoodgpflow.likelihoods.Likelihood

Return the likelihood of the GP.

property mean_functionmarkovflow.mean_function.MeanFunction

Return the mean function of the GP.

property dist_pmarkovflow.gauss_markov.GaussMarkovDistribution

Return the prior Gauss-Markov distribution.

property dist_qmarkovflow.gauss_markov.GaussMarkovDistribution

Return the variational distribution as a Gauss-Markov distribution.

property posteriormarkovflow.posterior.PosteriorProcess

Obtain a posterior process for inference.

For this class this is the AnalyticPosteriorProcess built from the variational distribution. This will be a locally optimal variational approximation of the posterior after optimisation.

loss()tf.Tensor

Return the loss, which is the negative ELBO.

class CVIGaussianProcess(input_data: Tuple[tf.Tensor, tf.Tensor], kernel: markovflow.kernels.SDEKernel, likelihood: gpflow.likelihoods.Likelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, learning_rate=0.1)[source]

Bases: GaussianProcessWithSitesBase

Provides an alternative parameterization to a VariationalGaussianProcess.

This class approximates the posterior of a model with a GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites.

The following notation is used:

  • \(x\) - the time points of the training data

  • \(y\) - observations corresponding to time points \(x\)

  • \(s(.)\) - the latent state of the Markov chain

  • \(f(.)\) - the noise free predictions of the model

  • \(p(y | f)\) - the likelihood

  • \(t(f)\) - a site (indices will refer to the associated data point)

  • \(p(.)\) the prior distribution

  • \(q(.)\) the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies:

  • The conditional density of neighbouring latent states \(p(sₖ₊₁| sₖ)\)

  • How to read out the latent process from these states \(fₖ = H sₖ\)

The likelihood links data to the latent process and \(p(yₖ | fₖ)\). We would like to approximate the posterior over the latent state space model of this model.

To approximate the posterior, we maximise the evidence lower bound (ELBO) \(ℒ\) with respect to the parameters of the variational distribution, since:

\[log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]\]

…where:

\[ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds\]

We parameterize the variational posterior through sites \(tₖ(fₖ)\):

\[q(s) = p(s) ∏ₖ tₖ(fₖ)\]

…where \(tₖ(fₖ)\) are univariate Gaussian sites parameterized in the natural form:

\[t(f) = exp(𝜽ᵀφ(f) - A(𝜽))\]

…and where \(𝜽=[θ₁,θ₂]\) and \(𝛗(f)=[f,f²]\).

Here, \(𝛗(f)\) are the sufficient statistics and \(𝜽\) are the natural parameters. Note that the subscript \(k\) has been omitted for simplicity.

The natural gradient update of the sites can be shown to be the gradient of the variational expectations:

\[𝐠 = ∇[𝞰][∫ log(p(y=Y|f)) q(f) df]\]

…with respect to the expectation parameters:

\[𝞰 = E[𝛗(f)] = [μ, σ² + μ²]\]

That is, \(𝜽 ← ρ𝜽 + (1-ρ)𝐠\), where \(ρ\) is the learning rate.

The key reference is:

@inproceedings{khan2017conjugate,
  title={Conjugate-Computation Variational Inference: Converting Variational Inference
         in Non-Conjugate Models to Inferences in Conjugate Models},
  author={Khan, Mohammad and Lin, Wu},
  booktitle={Artificial Intelligence and Statistics},
  pages={878--887},
  year={2017}
}
Parameters
  • input_data

    A tuple containing the observed data:

    • Time points of observations with shape batch_shape + [num_data]

    • Observations with shape batch_shape + [num_data, observation_dim]

  • kernel – A kernel that defines a prior over functions.

  • likelihood – A likelihood with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • learning_rate – The learning rate of the algorithm.

local_objective(Fmu: tf.Tensor, Fvar: tf.Tensor, Y: tf.Tensor)tf.Tensor

Calculate local loss in CVI.

Parameters
  • Fmu – Means with shape [..., latent_dim].

  • Fvar – Variances with shape [..., latent_dim].

  • Y – Observations with shape [..., observation_dim].

Returns

A local objective with shape [...].

local_objective_and_gradients(Fmu: tf.Tensor, Fvar: tf.Tensor)tf.Tensor

Return the local objective and its gradients with regard to the expectation parameters.

Parameters
  • Fmu – Means \(μ\) with shape [..., latent_dim].

  • Fvar – Variances \(σ²\) with shape [..., latent_dim].

Returns

A local objective and gradient with regard to \([μ, σ² + μ²]\).

update_sites()None

Perform one joint update of the Gaussian sites. That is:

\[𝜽 ← ρ𝜽 + (1-ρ)𝐠\]
elbo()tf.Tensor

Calculate the evidence lower bound (ELBO) \(log p(y)\).

This is done by computing the marginal of the model in which the likelihood terms were replaced by the Gaussian sites.

Returns

A scalar tensor representing the ELBO.

classic_elbo()tf.Tensor

Compute the ELBO the classic way. That is:

\[ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]\]

Note

This is mostly for testing purposes and should not be used for optimization.

Returns

A scalar tensor representing the ELBO.

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor

Compute the log density of the data at the new data points. :param input_data: A tuple of time points and observations containing the data at which

to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

Parameters

full_output_cov – Either full output covariance (True) or marginal variances (False).