markovflow.models.iwvi

Module containing a model for importance-weighted variational inference.

Module Contents

class ImportanceWeightedVI(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: gpflow.likelihoods.Likelihood, num_importance_samples: int, initial_distribution: Optional[markovflow.state_space_model.StateSpaceModel] = None, mean_function: Optional[markovflow.mean_function.MeanFunction] = None)[source]

Bases: markovflow.models.sparse_variational.SparseVariationalGaussianProcess

Performs importance-weighted variational inference (IWVI).

The key reference is:

@inproceedings{domke2018importance,
  title={Importance weighting and variational inference},
  author={Domke, Justin and Sheldon, Daniel R},
  booktitle={Advances in neural information processing systems},
  pages={4470--4479},
  year={2018}
}

The idea is based on the observation that an estimator of the evidence lower bound (ELBO) can be obtained from an importance weight \(w\):

\[L₁ = log w(x₁), x₁ ~ q(x)\]

…where \(x\) is the latent variable of the model (a GP, or set of GPs in our case) and the function \(w\) is:

\[w(x) = p(y | x) p(x) / q(x)\]

It follows that:

\[ELBO = 𝔼ₓ₁[ L₁ ]\]

…and:

\[log p(y) = log 𝔼ₓ₁[ w(x₁) ]\]

It turns out that there are a series of lower bounds given by taking multiple importance samples:

\[Lₙ = log (1/n) Σᵢⁿ w(xᵢ), xᵢ ~ q(x)\]

And we have the relation:

\[log p(y) >= 𝔼[Lₙ] >= 𝔼[Lₙ₋₁] >= ... >= 𝔼[L₁] = ELBO\]

This means that we can improve tightness of the ELBO to the log marginal likelihood by increasing \(n\), which we refer to in this class as num_importance_samples. The trade-offs are:

  • The objective function is now always stochastic, even for cases where the ELBO of the parent class is non-stochastic

  • We have to do more computations (evaluate the weights \(n\) times)

Parameters
  • kernel – A kernel that defines a prior over functions.

  • inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].

  • likelihood – A likelihood.

  • num_importance_samples – The number of samples for the importance-weighted estimator.

  • initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].

  • mean_function – The mean function for the GP. Defaults to no mean function.

elbo(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Compute the importance-weighted ELBO using K samples. The procedure is:

for k=1...K:
    uₖ ~ q(u)
    sₖ ~ p(s | u)
    wₖ = p(y | sₖ)p(uₖ) / q(uₖ)

ELBO = log (1/K) Σₖwₖ

Everything is computed in log-space for stability. Note that gradients of this ELBO may have high variance with regard to the variational parameters; see the DREGS gradient estimator method.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.

dregs_objective(input_data: Tuple[tf.Tensor, tf.Tensor])tf.Tensor[source]

Compute a scalar tensor that, when differentiated using tf.gradients, produces the DREGS variance controlled gradient.

See “Doubly Reparameterized Gradient Estimators For Monte Carlo Objectives” for a derivation.

We recommend using these gradients for training variational parameters and gradients of the importance-weighted ELBO for training hyperparameters.

Parameters

input_data

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

  • A tensor of inputs with shape batch_shape + [num_data]

  • A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.