markovflow.models.sparse_pep
Module containing a model for CVI
SparsePowerExpectationPropagation
Bases: markovflow.models.models.MarkovFlowSparseModel
markovflow.models.models.MarkovFlowSparseModel
This is the Sparse Power Expectation Propagation Algorithm
Approximates a the posterior of a model with GP prior and a general likelihood using a Gaussian posterior parameterized with Gaussian sites on inducing states u at inducing points z.
The following notation is used:
x - the time points of the training data. z - the time points of the inducing/pseudo points. y - observations corresponding to time points x. s(.) - the continuous time latent state process u = s(z) - the discrete inducing latent state space model f(.) - the noise free predictions of the model p(y | f) - the likelihood t(u) - a site (indices will refer to the associated data point) p(.) the prior distribution q(.) the variational distribution
x - the time points of the training data.
z - the time points of the inducing/pseudo points.
y - observations corresponding to time points x.
s(.) - the continuous time latent state process
u = s(z) - the discrete inducing latent state space model
f(.) - the noise free predictions of the model
p(y | f) - the likelihood
t(u) - a site (indices will refer to the associated data point)
p(.) the prior distribution
q(.) the variational distribution
We use the state space formulation of Markovian Gaussian Processes that specifies: the conditional density of neighbouring latent states: p(sₖ₊₁| sₖ) how to read out the latent process from these states: fₖ = H sₖ
The likelihood links data to the latent process and p(yₖ | fₖ). We would like to approximate the posterior over the latent state space model of this model.
To approximate the posterior, we maximise the evidence lower bound (ELBO) (ℒ) with respect to the parameters of the variational distribution, since:
log p(y) = ℒ(q) + KL[q(s) ‖ p(s | y)]
…where:
ℒ(q) = ∫ log(p(s, y) / q(s)) q(s) ds
We parameterize the variational posterior through M sites tₘ(vₘ)
q(s) = p(s) ∏ₘ tₘ(vₘ)
where tₘ(vₘ) are multivariate Gaussian sites on vₘ = [uₘ, uₘ₊₁], i.e. consecutive inducing states.
The sites are parameterized in the natural form
t(v) = exp(𝜽ᵀφ(v) - A(𝜽)), where 𝜽=[θ₁, θ₂] and 𝛗(u)=[v, vᵀv]
with 𝛗(v) are the sufficient statistics and 𝜽 the natural parameters
kernel – A kernel that defines a prior over functions.
inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].
batch_shape + [num_inducing]
likelihood – A likelihood.
mean_function – The mean function for the GP. Defaults to no mean function.
learning_rate – the learning rate
alpha – power as in Power Expectation Propagation
posterior
Posterior Process
mask_indices
Binary mask to exclude data indices :param exclude_indices:
back_project_nats
back project natural gradient associated to time points to their associated inducing sites.
local_objective
Local objective of the PEP algorithm : log E_q(f) p(y|f)ᵃ
local_objective_gradients
Gradients of the local objective of the PEP algorithm wrt to the predictive mean
fraction_sites
for all segment indexed m of consecutive inducing points [z_m, z_m+1[, this counts the time points t falling in that segment: c(m) = #{t, z_m <= t < z_m+1} and returns 1/c(m) or 0 when c(m)=0
time_points – tensor of shape batch_shape + [num_data]
tensor of shape batch_shape + [num_data]
compute_posterior_ssm
Computes the variational posterior distribution on the vector of inducing states
dist_q
compute_marginals
Compute pairwise marginals
remove_cavity_from_marginals
Remove cavity from marginals :param time_points: :param marginals: pairwise mean and covariance tensors
compute_cavity_state
The cavity distributions for data points at input time_points. This corresponds to the marginal distribution qᐠⁿ(fₙ) of qᐠⁿ(s) = q(s)/tₘ(vₘ)ᵝᵃ, where β = a * (1 / #time points touching site tₘ)
touching
compute_cavity
Cavity on f :param time_points: time points
compute_new_sites
Compute the site updates and perform one update step. :param input_data: A tuple of time points and observations containing the data from which
to calculate the the updates: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].
batch_shape + [num_data]
batch_shape + [num_data, observation_dim]
compute_log_norm
compute_num_data_per_interval
compute fraction of site per data point
compute_fraction
update_sites
apply updates
energy
The PEP energy : ∫ ds p(s) 𝚷_m t_m(v_m) :param input_data: input data
loss
Return the loss, which is the negative evidence lower bound (ELBO).
input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model.
dist_p
Return the prior GaussMarkovDistribution.
GaussMarkovDistribution
kernel
Return the kernel of the GP.
classic_elbo
ℒ(q) = Σᵢ ∫ log(p(yᵢ | f)) q(f) df - KL[q(f) ‖ p(f)]
Note: this is mostly for testing purposes and not to be used for optimization
input_data – A tuple of time points and observations
A scalar tensor representing the ELBO.
predict_log_density
Compute the log density of the data at the new data points.
input_data – A tuple of time points and observations containing the data at which to calculate the loss for training the model: a tensor of inputs with shape batch_shape + [num_data], a tensor of observations with shape batch_shape + [num_data, observation_dim].
full_output_cov – Either full output covariance (True) or marginal variances (False).
True
False