markovflow.models.iwvi
¶
Module containing a model for importance-weighted variational inference.
Module Contents¶
-
class
ImportanceWeightedVI
(kernel: markovflow.kernels.SDEKernel, inducing_points: tf.Tensor, likelihood: gpflow.likelihoods.Likelihood, num_importance_samples: int, initial_distribution: Optional[markovflow.state_space_model.StateSpaceModel] = None, mean_function: Optional[markovflow.mean_function.MeanFunction] = None)[source]¶ Bases:
markovflow.models.sparse_variational.SparseVariationalGaussianProcess
Performs importance-weighted variational inference (IWVI).
The key reference is:
@inproceedings{domke2018importance, title={Importance weighting and variational inference}, author={Domke, Justin and Sheldon, Daniel R}, booktitle={Advances in neural information processing systems}, pages={4470--4479}, year={2018} }
The idea is based on the observation that an estimator of the evidence lower bound (ELBO) can be obtained from an importance weight w:
L₁=logw(x₁),x₁ q(x)…where x is the latent variable of the model (a GP, or set of GPs in our case) and the function w is:
w(x)=p(y|x)p(x)/q(x)It follows that:
ELBO=𝔼ₓ₁[L₁]…and:
logp(y)=log𝔼ₓ₁[w(x₁)]It turns out that there are a series of lower bounds given by taking multiple importance samples:
Lₙ = log (1/n) Σᵢⁿ w(xᵢ), xᵢ ~ q(x)And we have the relation:
log p(y) >= 𝔼[Lₙ] >= 𝔼[Lₙ₋₁] >= ... >= 𝔼[L₁] = ELBOThis means that we can improve tightness of the ELBO to the log marginal likelihood by increasing n, which we refer to in this class as
num_importance_samples
. The trade-offs are:The objective function is now always stochastic, even for cases where the ELBO of the parent class is non-stochastic
We have to do more computations (evaluate the weights n times)
- Parameters
kernel – A kernel that defines a prior over functions.
inducing_points – The points in time on which inference should be performed, with shape
batch_shape + [num_inducing]
.likelihood – A likelihood.
num_importance_samples – The number of samples for the importance-weighted estimator.
initial_distribution – An initial configuration for the variational distribution, with shape
batch_shape + [num_inducing]
.mean_function – The mean function for the GP. Defaults to no mean function.
-
elbo
(input_data: Tuple[tf.Tensor, tf.Tensor]) → tf.Tensor[source]¶ Compute the importance-weighted ELBO using K samples. The procedure is:
for k=1...K: uₖ ~ q(u) sₖ ~ p(s | u) wₖ = p(y | sₖ)p(uₖ) / q(uₖ) ELBO = log (1/K) Σₖwₖ
Everything is computed in log-space for stability. Note that gradients of this ELBO may have high variance with regard to the variational parameters; see the DREGS gradient estimator method.
- Parameters
input_data –
A tuple of time points and observations containing the data at which to calculate the loss for training the model:
A tensor of inputs with shape
batch_shape + [num_data]
A tensor of observations with shape
batch_shape + [num_data, observation_dim]
- Returns
A scalar tensor.
-
dregs_objective
(input_data: Tuple[tf.Tensor, tf.Tensor]) → tf.Tensor[source]¶ Compute a scalar tensor that, when differentiated using
tf.gradients
, produces the DREGS variance controlled gradient.See “Doubly Reparameterized Gradient Estimators For Monte Carlo Objectives” for a derivation.
We recommend using these gradients for training variational parameters and gradients of the importance-weighted ELBO for training hyperparameters.
- Parameters
input_data –
A tuple of time points and observations containing the data at which to calculate the loss for training the model:
A tensor of inputs with shape
batch_shape + [num_data]
A tensor of observations with shape
batch_shape + [num_data, observation_dim]
- Returns
A scalar tensor.