markovflow.kalman_filter

Module containing a Kalman filter.

Module Contents

class BaseKalmanFilter(state_space_model: markovflow.state_space_model.StateSpaceModel, emission_model: markovflow.emission_model.EmissionModel)[source]

Bases: tf.Module, abc.ABC

Performs a Kalman filter on a StateSpaceModel and EmissionModel, with given observations.

The key reference is:

@inproceedings{grigorievskiy2017parallelizable,
    title={Parallelizable sparse inverse formulation Gaussian processes (SpInGP)},
    author={Grigorievskiy, Alexander and Lawrence, Neil and S{\"a}rkk{\"a}, Simo},
    booktitle={Int'l Workshop on Machine Learning for Signal Processing (MLSP)},
    pages={1--6},
    year={2017},
    organization={IEEE}
}

The following notation from the above paper is used:

  • \(G = I_N ⊗ H\), where \(⊗\) is the Kronecker product

  • \(R\) is the observation covariance

  • \(Σ = I_N ⊗ R\)

  • \(K⁻¹ = A⁻ᵀQ⁻¹A⁻¹\) is the precision, where \(A⁻ᵀ = [Aᵀ]⁻¹ = [A⁻¹]ᵀ\)

  • \(L\) is the Cholesky of \(K⁻¹ + GᵀΣ⁻¹G\). That is, \(LLᵀ = K⁻¹ + GᵀΣ⁻¹G\)

  • \(y\) is the observation matrix

Parameters
  • state_space_model – Parametrises the latent chain.

  • emission_model – Maps the latent chain to the observations.

property _r_inv[source]

Precision of observation model

property observations[source]

Observation vector

property _k_inv_prior[source]

Prior precision

property _k_inv_post[source]

Posterior precision

property _log_det_observation_precision[source]

Sum of log determinant of the precisions of the observation model

posterior_state_space_model()markovflow.state_space_model.StateSpaceModel[source]

Return the posterior as a state space model.

The marginal means and covariances are given by:

\[\begin{split}&μ(Χ) = (K⁻¹ + GᵀΣ⁻¹G)⁻¹[GᵀΣ⁻¹y + K⁻¹μ]\\ &P(X) = K⁻¹ + GᵀΣ⁻¹G\end{split}\]

…where \(μ\) is a block vector of the marginal means.

We can derive the state transitions \(aₖ\) and process noise covariances \(qₖ\) from the block tridiagonal matrix (see upper_diagonal_lower()). Lower case is used to attempt to distinguish the posterior and prior parameters.

We then need to calculate \(μ₀\) and \(bₖ\) (this is what most of the code in this function does). This can be calculated from:

\[K⁻¹ₚₒₛₜμₚₒₛₜ = GᵀΣ⁻¹y + K⁻¹ₚᵣᵢₒᵣμₚᵣᵢₒᵣ\]

Firstly, we use that for any StateSpaceModel:

\[K⁻¹μ = A⁻ᵀ Q⁻¹ m\]

…where \(m = [μ₀, b₁,... bₙ]\) and:

A⁻¹ =  [ I             ]      Q⁻¹ =  [ P₀⁻¹          ]
       [-A₁, I         ]            [    Q₁⁻¹       ]
       [    -A₂, I     ]            [       ᨞      ]
       [         ᨞  ᨞  ]            [         ᨞    ]
       [         -Aₙ, I]            [           Qₙ⁻¹]

So:

\[mₚₒₛₜ = Qₚₒₛₜ Aₚₒₛₜᵀ [GᵀΣ⁻¹y + Kₚᵣᵢₒᵣ⁻¹mₚᵣᵢₒᵣ]\]
Returns

The posterior as a state space model.

log_likelihood()tf.Tensor[source]

Construct a TensorTlow function to compute the likelihood.

We set \(y = obs - Hμ\) (where \(μ\) is the vector of marginal state means):

\[\begin{split}log p(obs|params) = &- ᴺ⁄₂log(2π) - ½(log |K⁻¹ + GᵀΣ⁻¹G| - log |K⁻¹| - log |Σ⁻¹|)\\ &- ½ yᵀ(Σ⁻¹ - Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹GᵀΣ⁻¹)y\end{split}\]

…where \(N\) is the dimensionality of the precision object, that is state_dim * (num_transitions + 1).

We break up the log likelihood as: cst + term1 + term2 + term3. That is, as:

  • cst: \(- ᴺ⁄₂log(2π)\)

  • term 1: \(- ½ yᵀΣ⁻¹y\)

  • term 2:

    \[½ yᵀΣ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹GᵀΣ⁻¹)y = ½ yᵀΣ⁻¹G(LLᵀ)⁻¹GᵀΣ⁻¹)y = ½|L⁻¹(GᵀΣ⁻¹)y|²\]
  • term 3:

    \[- ½(log |K⁻¹ + GᵀΣ⁻¹G| - log |K⁻¹| - log |Σ⁻¹|) = ½log |K⁻¹| - log |L| + ½log |Σ⁻¹|\]

Note that there are a couple of mistakes in the SpinGP paper for this formula (18):

  • They have \(- ½(... + log |Σ⁻¹|)\). It should be \(- ½(... - log |Σ⁻¹|)\)

  • They have \(- ½ yᵀ(... Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹)y\). It should be \(- ½ yᵀ(... Σ⁻¹G(K⁻¹ + GᵀΣ⁻¹G)⁻¹GᵀΣ⁻¹)y\)

Returns

The likelihood as a scalar tensor (we sum over the batch_shape).

_back_project_y_to_state(observations: tf.Tensor)tf.Tensor[source]

Back project from the observation space to the state_space, i.e. calculate (GᵀΣ⁻¹)y.

Parameters

observations – a tensor y of shape batch_shape + [num_data, output_dim]

Returns

a tensor (GᵀΣ⁻¹)y of shape batch_shape + [num_data, state_dim]

class KalmanFilter(state_space_model: markovflow.state_space_model.StateSpaceModel, emission_model: markovflow.emission_model.EmissionModel, observations: tf.Tensor, chol_obs_covariance: gpflow.base.TensorType)[source]

Bases: BaseKalmanFilter

Performs a Kalman filter on a StateSpaceModel and EmissionModel, with given observations.

The key reference is:

@inproceedings{grigorievskiy2017parallelizable,
    title={Parallelizable sparse inverse formulation Gaussian processes (SpInGP)},
    author={Grigorievskiy, Alexander and Lawrence, Neil and S{\"a}rkk{\"a}, Simo},
    booktitle={Int'l Workshop on Machine Learning for Signal Processing (MLSP)},
    pages={1--6},
    year={2017},
    organization={IEEE}
}

The following notation from the above paper is used:

  • \(G = I_N ⊗ H\), where \(⊗\) is the Kronecker product

  • \(R\) is the observation covariance

  • \(Σ = I_N ⊗ R\)

  • \(K⁻¹ = A⁻ᵀQ⁻¹A⁻¹\) is the precision, where \(A⁻ᵀ = [Aᵀ]⁻¹ = [A⁻¹]ᵀ\)

  • \(L\) is the Cholesky of \(K⁻¹ + GᵀΣ⁻¹G\). That is, \(LLᵀ = K⁻¹ + GᵀΣ⁻¹G\)

  • \(y\) is the observation matrix

Parameters
  • state_space_model – Parametrises the latent chain.

  • emission_model – Maps the latent chain to the observations.

  • observations – Data with shape [num_transitions + 1, output_dim].

  • chol_obs_covariance – A TensorType with shape [output_dim, output_dim] for the Cholesky factor of the covariance to be applied to \(f\) from emission_model.

property _r_inv[source]

Precision of the observation model

property observations[source]

Observation vector

class GaussianSites(name=None)[source]

Bases: tf.Module, abc.ABC

This class is a wrapper around the parameters specifying multiple independent Gaussian distributions.

property means[source]

Return the means of the Gaussians.

property precisions[source]

Return the precisions of the Gaussians.

property log_det_precisions[source]

Return the sum of the log determinant of the observation precisions.

class UnivariateGaussianSitesNat(nat1, nat2, log_norm=None)[source]

Bases: GaussianSites

This class is a wrapper around parameters of univariate Gaussian distributions in the natural form. That is:

\[p(f) = exp(𝞰ᵀφ(f) - A(𝞰))\]

…where \(𝞰=[η₁,η₂]\) and \(𝛗(f)=[f,f²]\).

The mean \(μ\) and variance \(σ²\) parameterization is such that:

\[μ = -½η₁/η₂, σ²=-½η₂⁻¹\]
Parameters
  • nat1 – first natural parameter [N, D]

  • nat2 – second natural parameter [N, D, D]

  • log_norm – normalizer parameter [N, D]

property means[source]

Return the means of the Gaussians.

property precisions[source]

Return the precisions of the Gaussians.

property log_det_precisions[source]

Return the sum of the log determinant of the observation precisions.

class KalmanFilterWithSites(state_space_model: markovflow.state_space_model.StateSpaceModel, emission_model: markovflow.emission_model.EmissionModel, sites: GaussianSites)[source]

Bases: BaseKalmanFilter

Performs a Kalman filter on a StateSpaceModel and EmissionModel, with Gaussian sites, that is time dependent Gaussian Likelihood terms.

The key reference is:

@inproceedings{grigorievskiy2017parallelizable,
    title={Parallelizable sparse inverse formulation Gaussian processes (SpInGP)},
    author={Grigorievskiy, Alexander and Lawrence, Neil and S{\"a}rkk{\"a}, Simo},
    booktitle={Int'l Workshop on Machine Learning for Signal Processing (MLSP)},
    pages={1--6},
    year={2017},
    organization={IEEE}
}

The following notation from the above paper is used:

  • \(G = I_N ⊗ H\), where \(⊗\) is the Kronecker product

  • \(R = [R₁, R₂, ... Rₙ]\) is the observation covariance

  • \(Σ = blockdiag[R]\)

  • \(K⁻¹ = A⁻ᵀQ⁻¹A⁻¹\) is the precision, where \(A⁻ᵀ = [Aᵀ]⁻¹ = [A⁻¹]ᵀ\)

  • \(L\) is the Cholesky of \(K⁻¹ + GᵀΣ⁻¹G\). That is, \(LLᵀ = K⁻¹ + GᵀΣ⁻¹G\)

  • \(y\) is the observation matrix

Parameters
  • state_space_model – Parametrises the latent chain.

  • emission_model – Maps the latent chain to the observations.

  • sites – Gaussian sites parameterizing the Gaussian likelihoods.

property _r_inv[source]

Precisions of the observation model

property _log_det_observation_precision[source]

Sum of log determinant of the precisions of the observation model

property observations[source]

Observation vector

class KalmanFilterWithSparseSites(state_space_model: markovflow.state_space_model.StateSpaceModel, emission_model: markovflow.emission_model.EmissionModel, sites: GaussianSites, num_grid_points: int, observations_index: tf.Tensor, observations: tf.Tensor)[source]

Bases: BaseKalmanFilter

Performs a Kalman filter on a StateSpaceModel and EmissionModel, with Gaussian sites, over a time grid.

Parameters
  • state_space_model – Parameterises the latent chain.

  • emission_model – Maps the latent chain to the observations.

  • sites – Gaussian sites over the observations.

  • num_grid_points – number of grid points.

  • observations_index – Index of the observations in the time grid with shape (N,).

  • observations – Sparse observations with shape [n_batch] + (N, output_dim).

property _r_inv[source]

Precisions of the observation model over the time grid.

_drop_batch_shape(tensor: tf.Tensor)[source]

Check the batch, if present, is equal to 1, and drop it.

property _log_det_observation_precision[source]

Sum of log determinant of the precisions of the observation model. It only calculates for the data_sites as other sites precision is anyways zero.

property observations[source]

Sparse observation vector

property _r_inv_data[source]

Precisions of the observation model for only the data sites.

sparse_to_dense(tensor: tf.Tensor, output_shape: tf.TensorShape)tf.Tensor[source]

Convert a sparse tensor to a dense one on the basis of observations index, output tensor is of the output_shape.

dense_to_sparse(tensor: tf.Tensor)tf.Tensor[source]

Convert a dense tensor to a sparse one on the basis of observations index.

log_likelihood()tf.Tensor[source]

Construct a TensorFlow function to compute the likelihood.

For more mathematical details, look at the log_likelihood function of the parent class. The main difference from the parent class are that the vector of observations is now sparse.

Returns

The likelihood as a scalar tensor (we sum over the batch_shape).