`markovflow.models.iwvi`¶

Module containing a model for importance-weighted variational inference.

Module Contents¶

class ImportanceWeightedVI(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: gpflow.likelihoods.Likelihood, num_importance_samples: int, initial_distribution: Optional[markovflow.state_space_model.StateSpaceModel] = None, mean_function: Optional[markovflow.mean_function.MeanFunction] = None)[source]¶

Bases: markovflow.models.sparse_variational.SparseVariationalGaussianProcess

Performs importance-weighted variational inference (IWVI).

The key reference is:

@inproceedings{domke2018importance,
  title={Importance weighting and variational inference},
  author={Domke, Justin and Sheldon, Daniel R},
  booktitle={Advances in neural information processing systems},
  pages={4470--4479},
  year={2018}
}

The idea is based on the observation that an estimator of the evidence lower bound (ELBO) can be obtained from an importance weight $w$ :

$L₁ = log w(x₁), x₁ ~ q(x)$

…where $x$ is the latent variable of the model (a GP, or set of GPs in our case) and the function $w$ is:

$w(x) = p(y | x) p(x) / q(x)$

It follows that:

$ELBO = 𝔼ₓ₁[ L₁ ]$

…and:

$log p(y) = log 𝔼ₓ₁[ w(x₁) ]$

It turns out that there are a series of lower bounds given by taking multiple importance samples:

$Lₙ = log (1/n) Σᵢⁿ w(xᵢ), xᵢ ~ q(x)$

And we have the relation:

$log p(y) >= 𝔼[Lₙ] >= 𝔼[Lₙ₋₁] >= ... >= 𝔼[L₁] = ELBO$

This means that we can improve tightness of the ELBO to the log marginal likelihood by increasing $n$ , which we refer to in this class as num_importance_samples. The trade-offs are:

The objective function is now always stochastic, even for cases where the ELBO of the parent class is non-stochastic

We have to do more computations (evaluate the weights $n$ times)

Parameters

kernel – A kernel that defines a prior over functions.
inducing_points – The points in time on which inference should be performed, with shape batch_shape + [num_inducing].
likelihood – A likelihood.
num_importance_samples – The number of samples for the importance-weighted estimator.
initial_distribution – An initial configuration for the variational distribution, with shape batch_shape + [num_inducing].
mean_function – The mean function for the GP. Defaults to no mean function.

elbo(input_data: Tuple[tf.Tensor, tf.Tensor]) → tf.Tensor [source]¶

Compute the importance-weighted ELBO using K samples. The procedure is:

for k=1...K:
    uₖ ~ q(u)
    sₖ ~ p(s | u)
    wₖ = p(y | sₖ)p(uₖ) / q(uₖ)

ELBO = log (1/K) Σₖwₖ

Everything is computed in log-space for stability. Note that gradients of this ELBO may have high variance with regard to the variational parameters; see the DREGS gradient estimator method.

Parameters

input_data –

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

A tensor of inputs with shape batch_shape + [num_data]
A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.

dregs_objective(input_data: Tuple[tf.Tensor, tf.Tensor]) → tf.Tensor [source]¶

Compute a scalar tensor that, when differentiated using tf.gradients, produces the DREGS variance controlled gradient.

See “Doubly Reparameterized Gradient Estimators For Monte Carlo Objectives” for a derivation.

We recommend using these gradients for training variational parameters and gradients of the importance-weighted ELBO for training hyperparameters.

Parameters

input_data –

A tuple of time points and observations containing the data at which to calculate the loss for training the model:

A tensor of inputs with shape batch_shape + [num_data]
A tensor of observations with shape batch_shape + [num_data, observation_dim]

Returns

A scalar tensor.

markovflow.models.iwvi¶

Module Contents¶

`markovflow.models.iwvi`¶