markovflow.models.iwvi

Module containing a model for importance-weighted variational inference.

Module Contents

class ImportanceWeightedVI(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: gpflow.likelihoods.Likelihood, num_importance_samples: int, initial_distribution: Optional[markovflow.state_space_model.StateSpaceModel] = None, mean_function: Optional[markovflow.mean_function.MeanFunction] = None)[source]

Bases: markovflow.models.sparse_variational.SparseVariationalGaussianProcess

Performs importance-weighted variational inference (IWVI).

The key reference is:

@inproceedings{domke2018importance,
  title={Importance weighting and variational inference},
  author={Domke, Justin and Sheldon, Daniel R},
  booktitle={Advances in neural information processing systems},
  pages={4470--4479},
  year={2018}
}

The idea is based on the observation that an estimator of the evidence lower bound (ELBO) can be obtained from an importance weight w:

L=logw(x),x q(x)

…where x is the latent variable of the model (a GP, or set of GPs in our case) and the function w is:

w(x)=p(y|x)p(x)/q(x)

It follows that:

ELBO=𝔼[L]

…and:

logp(y)=log𝔼[w(x)]

It turns out that there are a series of lower bounds given by taking multiple importance samples:

Lₙ = log (1/n) Σᵢⁿ w(xᵢ), xᵢ ~ q(x)

And we have the relation:

log p(y) >= 𝔼[Lₙ] >= 𝔼[Lₙ₋₁] >= ... >= 𝔼[L₁] = ELBO

This means that we can improve tightness of the ELBO to the log marginal likelihood by increasing n, which we refer to in this class as num_importance_samples. The trade-offs are:

  • The objective function is now always stochastic, even for cases where the ELBO of the parent class is non-stochastic

  • We have to do more computations (evaluate the weights n times)

Parameters
  • kernel – A kernel that defines a prior over functions.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • likelihood – A likelihood.

  • num_importance_samples – The number of samples for the importance-weighted estimator.

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

elbo(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Compute the importance-weighted ELBO using K samples. The procedure is:

for k=1...K:
    uₖ ~ q(u)
    sₖ ~ p(s | u)
    wₖ = p(y | sₖ)p(uₖ) / q(uₖ)

ELBO = log (1/K) Σₖwₖ

Everything is computed in log-space for stability. Note that gradients of this ELBO may have high variance with regard to the variational parameters; see the DREGS gradient estimator method.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.

dregs_objective(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Compute a scalar tensor that, when differentiated using tf.gradients, produces the DREGS variance controlled gradient.

See “Doubly Reparameterized Gradient Estimators For Monte Carlo Objectives” for a derivation.

We recommend using these gradients for training variational parameters and gradients of the importance-weighted ELBO for training hyperparameters.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.