markovflow.ssm_natgrad
Module containing a natural gradient optimiser.
SSMNaturalGradient
Bases: tensorflow.optimizers.Optimizer
tensorflow.optimizers.Optimizer
Represents a natural gradient optimiser. It is also capable of updating parameters with momentum, as per the Adam optimiser.
Adam
To account for momentum we keep track of a running moving average for \(g̃\) and the Fisher norm of \(g̃\), where \(g̃ = F⁻¹g\) is the natural gradient. This is defined as the Euclidean gradient preconditioned by the inverse Fisher information matrix.
The Fisher norm of the natural gradient is given by:
…which is the inner product between the natural gradient and the Euclidean gradient.
The moving average for the natural gradient and its norm are given by:
The final update is given by:
gamma – The learning rate of the optimiser.
momentum – Whether to update with momentum or not.
beta1 – The momentum parameter for the moving average of the natural gradient.
beta2 – The momentum parameter for the moving average of the norm of natural gradient.
epsilon – A small constant to make sure we do not divide by \(0\) in the momentum term.
name – Optional name to give the optimiser.
minimize
Minimise the objective function of the model.
Note the natural gradient optimiser works with variational parameters only.
loss_fn – The Loss function.
ssm – A state space model that represents our variational posterior.
_natgrad_steps
Call a natgrad step after wrapping it in a name scope.
loss_fn – A Loss function.
_natgrad_step
Implements equation [10] from
Gaussian Process Models},
author={Salimbeni, Hugh and Eleftheriadis, Stefanos and Hensman, James}, booktitle={AISTATS}, year={2018}
In addition, for convenience with the rest of MarkovFlow, this code computes ∂L/∂η using the chain rule:
g̃ = ∂L/∂η = [(∂L / ∂[ssm_variables])(∂[ssm_variables] / ∂η)]ᵀ
In the code η = eta, θ = theta.
loss_fn – Loss function.
get_config
Return a Python dictionary containing the configuration of the optimiser.
_resource_apply_dense
Add ops to apply dense gradients to the variable handle.
handle
grad: a Tensor representing the gradient. handle: a Tensor of dtype resource which points to the variable to be
Tensor
resource
updated.
apply_state: A dict which is used across multiple apply calls.
An Operation which updates the value of the variable.
Operation
_resource_apply_sparse
Add ops to apply sparse gradients to the variable handle.
Similar to _apply_sparse, the indices argument to this method has been de-duplicated. Optimizers which deal correctly with non-unique indices may instead override _resource_apply_sparse_duplicate_indices to avoid this overhead.
_apply_sparse
indices
_resource_apply_sparse_duplicate_indices
grad: a Tensor representing the gradient for the affected indices. handle: a Tensor of dtype resource which points to the variable to be
the gradient is nonzero. Indices are unique.