markovflow.models.sparse_pep

Module containing a model for CVI

Module Contents

class SparsePowerExpectationPropagation(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: markovflow.likelihoods.PEPScalarLikelihood, mean_function: Optional[markovflow.mean_function.MeanFunction] = None, learning_rate=1.0, alpha=1.0)[source]

Bases: markovflow.models.models.MarkovFlowSparseModel

This is the Sparse Power Expectation Propagation Algorithm

Approximates a the posterior of a model with GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites on inducing states u at inducing points z.

The following notation is used:

  • x - the time points of the training data.

  • z - the time points of the inducing/pseudo points.

  • y - observations corresponding to time points x.

  • s(.) - the continuous time latent state process

  • u = s(z) - the discrete inducing latent state space model

  • f(.) - the noise free predictions of the model

  • p(y | f) - the likelihood

  • t(u) - a site (indices will refer to the associated data point)

  • p(.) the prior distribution

  • q(.) the variational distribution

We use the state space formulation of Markovian Gaussian Processes that specifies: the conditional density of neighbouring latent states: p(sₖ₊₁| sₖ) how to read out the latent process from these states: fₖ = H sₖ

The likelihood links data to the latent process and p(yₖ | fₖ). We would like to approximate the posterior over the latent state space model of this model.

To approximate the posterior, we maximise the evidence lower bound (ELBO) (ℒ) with respect to the parameters of the variational distribution, since:

log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]

…where:

ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds

We parameterize the variational posterior through M sites tₘ(vₘ)

q(s) = p(s) ∏ₘ tₘ(vₘ)

where tₘ(vₘ) are multivariate Gaussian sites on vₘ = [uₘ, uₘ₊₁], i.e. consecutive inducing states.

The sites are parameterized in the natural form

t(v) = exp(𝜽ᵀφ(v) - A(𝜽)), where 𝜽=[θ₁, θ₂] and 𝛗(u)=[v, vᵀv]

with 𝛗(v) are the sufficient statistics and 𝜽 the natural parameters

Parameters
  • kernel – A kernel that defines a prior over functions.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • likelihood – A likelihood.

  • mean_function – The mean function for the GP. Defaults to no mean function.

  • learning_rate – the learning rate

  • alpha – power as in Power Expectation Propagation

posterior()[source]

Posterior Process

mask_indices(exclude_indices)[source]

Binary mask to exclude data indices :param exclude_indices:

back_project_nats(nat1, nat2, time_points)[source]

back project natural gradient associated to time points to their associated inducing sites.

local_objective(Fmu, Fvar, Y)[source]

Local objective of the PEP algorithm : log E_q(f) p(y|f)ᵃ

local_objective_gradients(fx_mus, fx_covs, observations, alpha=1.0)[source]

Gradients of the local objective of the PEP algorithm wrt to the predictive mean

fraction_sites(time_points)[source]

for all segment indexed m of consecutive inducing points [z_m, z_m+1[, this counts the time points t falling in that segment: c(m) = #{t, z_m <= t < z_m+1} and returns 1/c(m) or 0 when c(m)=0

Parameters

time_points – tensor of shape batch_shape + [num_data]

Returns

tensor of shape batch_shape + [num_data]

compute_posterior_ssm(nat1, nat2)[source]

Computes the variational posterior distribution on the vector of inducing states

property dist_q[source]

Computes the variational posterior distribution on the vector of inducing states

compute_marginals()[source]

Compute pairwise marginals

remove_cavity_from_marginals(time_points, marginals)[source]

Remove cavity from marginals :param time_points: :param marginals: pairwise mean and covariance tensors

compute_cavity_state(time_points)[source]

The cavity distributions for data points at input time_points. This corresponds to the marginal distribution qᐠⁿ(fₙ) of qᐠⁿ(s) = q(s)/tₘ(vₘ)ᵝᵃ, where β = a * (1 / #time points touching site tₘ)

compute_cavity(time_points)[source]

Cavity on f :param time_points: time points

compute_new_sites(input_data)[source]

Compute the site updates and perform one update step. :param input_data: A tuple of time points and observations containing the data from which

to calculate the the updates: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

compute_log_norm(input_data)[source]

Compute the site updates and perform one update step. :param input_data: A tuple of time points and observations containing the data from which

to calculate the the updates: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

compute_num_data_per_interval(time_points)[source]

compute fraction of site per data point

compute_fraction(time_points)[source]

compute fraction of site per data point

update_sites(input_data)[source]

apply updates

energy(input_data)[source]

The PEP energy : ∫ ds p(s) 𝚷_m t_m(v_m) :param input_data: input data

loss(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Return the loss, which is the negative evidence lower bound (ELBO).

Parameters

input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model.

property dist_pmarkovflow.gauss_markov.GaussMarkovDistribution[source]

Return the prior GaussMarkovDistribution.

property kernelmarkovflow.kernels.SDEKernel[source]

Return the kernel of the GP.

classic_elbo(input_data: Tuple[tf.Tensor, tf.Tensor])[source]
Computes the ELBO the classic way:

ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]

Note: this is mostly for testing purposes and not to be used for optimization

Parameters

input_data – A tuple of time points and observations

Returns

A scalar tensor representing the ELBO.

predict_log_density(input_data: Tuple[tf.Tensor, tf.Tensor], full_output_cov: bool = False)tf.Tensor[source]

Compute the log density of the data at the new data points.

Parameters
  • input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].

  • full_output_cov – Either full output covariance (True) or marginal variances (False).