markovflow.kalman_filter
Module containing a Kalman filter.
BaseKalmanFilter
Bases: tf.Module, abc.ABC
tf.Module
abc.ABC
Performs a Kalman filter on a StateSpaceModel and EmissionModel, with given observations.
StateSpaceModel
EmissionModel
The key reference is:
@inproceedings{grigorievskiy2017parallelizable, title={Parallelizable sparse inverse formulation Gaussian processes (SpInGP)}, author={Grigorievskiy, Alexander and Lawrence, Neil and S{\"a}rkk{\"a}, Simo}, booktitle={Int'l Workshop on Machine Learning for Signal Processing (MLSP)}, pages={1--6}, year={2017}, organization={IEEE} }
The following notation from the above paper is used:
\(G = I_N ⊗ H\), where \(⊗\) is the Kronecker product \(R\) is the observation covariance \(Σ = I_N ⊗ R\) \(K⁻¹ = A⁻ᵀQ⁻¹A⁻¹\) is the precision, where \(A⁻ᵀ = [Aᵀ]⁻¹ = [A⁻¹]ᵀ\) \(L\) is the Cholesky of \(K⁻¹ + GᵀΣ⁻¹G\). That is, \(LLᵀ = K⁻¹ + GᵀΣ⁻¹G\) \(y\) is the observation matrix
\(G = I_N ⊗ H\), where \(⊗\) is the Kronecker product
\(R\) is the observation covariance
\(Σ = I_N ⊗ R\)
\(K⁻¹ = A⁻ᵀQ⁻¹A⁻¹\) is the precision, where \(A⁻ᵀ = [Aᵀ]⁻¹ = [A⁻¹]ᵀ\)
\(L\) is the Cholesky of \(K⁻¹ + GᵀΣ⁻¹G\). That is, \(LLᵀ = K⁻¹ + GᵀΣ⁻¹G\)
\(y\) is the observation matrix
state_space_model – Parametrises the latent chain.
emission_model – Maps the latent chain to the observations.
_r_inv
Precision of observation model
observations
Observation vector
_k_inv_prior
Prior precision
_k_inv_post
Posterior precision
_log_det_observation_precision
Sum of log determinant of the precisions of the observation model
posterior_state_space_model
Return the posterior as a state space model.
The marginal means and covariances are given by:
…where \(μ\) is a block vector of the marginal means.
We can derive the state transitions \(aₖ\) and process noise covariances \(qₖ\) from the block tridiagonal matrix (see upper_diagonal_lower()). Lower case is used to attempt to distinguish the posterior and prior parameters.
upper_diagonal_lower()
We then need to calculate \(μ₀\) and \(bₖ\) (this is what most of the code in this function does). This can be calculated from:
Firstly, we use that for any StateSpaceModel:
…where \(m = [μ₀, b₁,... bₙ]\) and:
A⁻¹ = [ I ] Q⁻¹ = [ P₀⁻¹ ] [-A₁, I ] [ Q₁⁻¹ ] [ -A₂, I ] [ ᨞ ] [ ᨞ ᨞ ] [ ᨞ ] [ -Aₙ, I] [ Qₙ⁻¹]
So:
The posterior as a state space model.
log_likelihood
Construct a TensorTlow function to compute the likelihood.
We set \(y = obs - Hμ\) (where \(μ\) is the vector of marginal state means):
…where \(N\) is the dimensionality of the precision object, that is state_dim * (num_transitions + 1).
state_dim * (num_transitions + 1)
We break up the log likelihood as: cst + term1 + term2 + term3. That is, as:
cst: \(- ᴺ⁄₂log(2π)\) term 1: \(- ½ yᵀΣ⁻¹y\) term 2: \[½ yᵀΣ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹GᵀΣ⁻¹)y = ½ yᵀΣ⁻¹G(LLᵀ)⁻¹GᵀΣ⁻¹)y = ½|L⁻¹(GᵀΣ⁻¹)y|²\] term 3: \[- ½(log |K⁻¹ + GᵀΣ⁻¹G| - log |K⁻¹| - log |Σ⁻¹|) = ½log |K⁻¹| - log |L| + ½log |Σ⁻¹|\]
cst: \(- ᴺ⁄₂log(2π)\)
term 1: \(- ½ yᵀΣ⁻¹y\)
term 2:
term 3:
Note that there are a couple of mistakes in the SpinGP paper for this formula (18):
They have \(- ½(... + log |Σ⁻¹|)\). It should be \(- ½(... - log |Σ⁻¹|)\) They have \(- ½ yᵀ(... Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹)y\). It should be \(- ½ yᵀ(... Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹GᵀΣ⁻¹)y\)
They have \(- ½(... + log |Σ⁻¹|)\). It should be \(- ½(... - log |Σ⁻¹|)\)
They have \(- ½ yᵀ(... Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹)y\). It should be \(- ½ yᵀ(... Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹GᵀΣ⁻¹)y\)
The likelihood as a scalar tensor (we sum over the batch_shape).
batch_shape
_back_project_y_to_state
Back project from the observation space to the state_space, i.e. calculate (GᵀΣ⁻¹)y.
observations – a tensor y of shape batch_shape + [num_data, output_dim]
a tensor (GᵀΣ⁻¹)y of shape batch_shape + [num_data, state_dim]
KalmanFilter
Bases: BaseKalmanFilter
observations – Data with shape [num_transitions + 1, output_dim].
[num_transitions + 1, output_dim]
chol_obs_covariance – A TensorType with shape [output_dim, output_dim] for the Cholesky factor of the covariance to be applied to \(f\) from emission_model.
TensorType
[output_dim, output_dim]
emission_model
Precision of the observation model
GaussianSites
This class is a wrapper around the parameters specifying multiple independent Gaussian distributions.
means
Return the means of the Gaussians.
precisions
Return the precisions of the Gaussians.
log_det_precisions
Return the sum of the log determinant of the observation precisions.
UnivariateGaussianSitesNat
Bases: GaussianSites
This class is a wrapper around parameters of univariate Gaussian distributions in the natural form. That is:
…where \(𝞰=[η₁,η₂]\) and \(𝛗(f)=[f,f²]\).
The mean \(μ\) and variance \(σ²\) parameterization is such that:
nat1 – first natural parameter [N, D]
nat2 – second natural parameter [N, D, D]
log_norm – normalizer parameter [N, D]
KalmanFilterWithSites
Performs a Kalman filter on a StateSpaceModel and EmissionModel, with Gaussian sites, that is time dependent Gaussian Likelihood terms.
\(G = I_N ⊗ H\), where \(⊗\) is the Kronecker product \(R = [R₁, R₂, ... Rₙ]\) is the observation covariance \(Σ = blockdiag[R]\) \(K⁻¹ = A⁻ᵀQ⁻¹A⁻¹\) is the precision, where \(A⁻ᵀ = [Aᵀ]⁻¹ = [A⁻¹]ᵀ\) \(L\) is the Cholesky of \(K⁻¹ + GᵀΣ⁻¹G\). That is, \(LLᵀ = K⁻¹ + GᵀΣ⁻¹G\) \(y\) is the observation matrix
\(R = [R₁, R₂, ... Rₙ]\) is the observation covariance
\(Σ = blockdiag[R]\)
sites – Gaussian sites parameterizing the Gaussian likelihoods.
Precisions of the observation model
KalmanFilterWithSparseSites
Performs a Kalman filter on a StateSpaceModel and EmissionModel, with Gaussian sites, over a time grid.
state_space_model – Parameterises the latent chain.
sites – Gaussian sites over the observations.
num_grid_points – number of grid points.
observations_index – Index of the observations in the time grid with shape (N,).
observations – Sparse observations with shape [n_batch] + (N, output_dim).
Precisions of the observation model over the time grid.
_drop_batch_shape
Check the batch, if present, is equal to 1, and drop it.
Sum of log determinant of the precisions of the observation model. It only calculates for the data_sites as other sites precision is anyways zero.
Sparse observation vector
_r_inv_data
Precisions of the observation model for only the data sites.
sparse_to_dense
Convert a sparse tensor to a dense one on the basis of observations index, output tensor is of the output_shape.
dense_to_sparse
Convert a dense tensor to a sparse one on the basis of observations index.
Construct a TensorFlow function to compute the likelihood.
For more mathematical details, look at the log_likelihood function of the parent class. The main difference from the parent class are that the vector of observations is now sparse.