Demo of MultiStageLikelihood with plain SVGP model

We demonstrate a MultiStageLikelihood driven by a multi-output SVGP model. We fit the model to samples from the MultiStageLikelihood given toy functions from a Gaussian process draw.

[1]:
import gpflow
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from gpflow import set_trainable
from gpflow.ci_utils import ci_niter
from matplotlib import pyplot as plt
2022-09-17 15:50:18.054886: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.13/x64/lib
2022-09-17 15:50:18.054922: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[2]:
import markovflow.kernels as mfk
from markovflow.models.sparse_variational import SparseVariationalGaussianProcess as SVGP
from markovflow.ssm_natgrad import SSMNaturalGradient
from markovflow.likelihoods.mutlistage_likelihood import MultiStageLikelihood

Generate artificial data

We draw toy functions for the three likelihood parameters from a Gaussian process.

[3]:
N = 100  # number of training points
X_train = np.arange(N).astype(float)
L = 3  # number of latent functions
[4]:
# Define the kernel
k1a = gpflow.kernels.Periodic(
    gpflow.kernels.Matern52(variance=1.0, lengthscales=3.0), period=12.0
)
k1b = gpflow.kernels.Matern52(variance=1.0, lengthscales=30.0)
k2 = gpflow.kernels.Matern32(variance=0.1, lengthscales=5.0)
k = k1a * k1b + k2

# Draw three independent functions from the same Gaussian process
X = X_train
num_latent = L
K = k(X[:, None])
np.random.seed(123)
v = np.random.randn(len(K), num_latent)
# We draw samples from a GP with kernel k(.) evaluated at X by reparameterizing:
# f ~ N(0, K) → f = chol(K) v, v ~ N(0, I), where chol(K) chol(K)ᵀ = K
f = np.linalg.cholesky(K + 1e-6 * np.eye(len(K))) @ v

# We shift the third function to increase the mean of the Poisson component to 20 to make it easier to identify
f += np.array([0.0, 0.0, np.log(20)]).reshape(1, L)

# Plot all three functions
plt.figure()
for i in range(num_latent):
    plt.plot(X, f[:, i])
_ = plt.xticks(np.arange(0, 100, 12))
2022-09-17 15:50:19.818926: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-09-17 15:50:19.819119: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.13/x64/lib
2022-09-17 15:50:19.819130: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-09-17 15:50:19.819148: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az178-774): /proc/driver/nvidia/version does not exist
2022-09-17 15:50:19.819406: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-17 15:50:19.819519: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
../_images/notebooks_multistage_likelihood_5_1.png

The above latent GPs represent how the three likelihood parameters will change over time.

[5]:
# Define the likelihood
lik = MultiStageLikelihood()
# Draw observations from the likelihood given the functions `f` from the previous step
Y = lik.sample_y(tf.convert_to_tensor(f, dtype=gpflow.default_float()))
2022-09-17 15:50:20.124439: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-09-17 15:50:20.126792: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2793435000 Hz
[6]:
# Plot the observations
_ = plt.plot(X, Y, ".")
../_images/notebooks_multistage_likelihood_8_0.png
[7]:
Y_train = Y
data = (X_train, Y_train)

Create the model

(Note: as we are replicating the modelling task, we pretend we don’t know the underlying processes that created the artificial data.)

We decide to create 3 GPs each with an independent Matern kernel, as we need three functions to drive the three parameters of the likelihood. This will correspond to learning 6 hyperparameters (3 GPs with 2 hyper-parameters each).

[8]:
# Create kernels
kern_list = [mfk.Matern32(variance=1.0, lengthscale=10.0, jitter=1e-6) for _ in range(L)]
[9]:
# Create multi-output kernel from kernel list
ker = mfk.IndependentMultiOutput(kern_list)
[10]:
# Create evenly spaced inducing points
num_inducing = N // 10
Z = np.linspace(X_train.min(), X_train.max(), num_inducing)
[11]:
# create multi-output inducing variables from Z
inducing_variable = tf.constant(Z)
[12]:
likelihood = MultiStageLikelihood()
[13]:
model = SVGP(
    kernel=ker, likelihood=likelihood, inducing_points=inducing_variable, mean_function=None,
)
WARNING:tensorflow:From /home/runner/work/markovflow/markovflow/.venv/lib/python3.7/site-packages/tensorflow/python/ops/linalg/linear_operator_block_diag.py:223: LinearOperator.graph_parents (from tensorflow.python.ops.linalg.linear_operator) is deprecated and will be removed in a future version.
Instructions for updating:
Do not call `graph_parents`.
[14]:
X_grid = X

Optimise the model

NatGrads and Adam for SVGP

[15]:
adam_learning_rate = 0.001
natgrad_learning_rate = 0.05
[16]:
adam_opt = tf.optimizers.Adam(learning_rate=adam_learning_rate)
natgrad_opt = SSMNaturalGradient(gamma=natgrad_learning_rate, momentum=False)

# Stop Adam from optimizing the variational parameters

set_trainable(model.dist_q, False)
adam_var_list = model.trainable_variables
set_trainable(model.dist_q, True)
2022-09-17 15:50:20.415025: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.

We separate the training process into hyper-parameters and variational parameters, which in practice, can result in better training.

For the variational parameters we can use much larger step sizes using natural gradients (whereas the hyper-parameters take smaller steps using Adam).

This results in optimisation being more efficient (i.e. faster and better results).

[17]:
# Variables optimized by the Adam optimizer:
print(adam_var_list)
(<tf.Variable 'lengthscale:0' shape=() dtype=float64, numpy=9.99995459903963>, <tf.Variable 'variance:0' shape=() dtype=float64, numpy=0.5413248546129181>, <tf.Variable 'lengthscale:0' shape=() dtype=float64, numpy=9.99995459903963>, <tf.Variable 'variance:0' shape=() dtype=float64, numpy=0.5413248546129181>, <tf.Variable 'lengthscale:0' shape=() dtype=float64, numpy=9.99995459903963>, <tf.Variable 'variance:0' shape=() dtype=float64, numpy=0.5413248546129181>)
[18]:
@tf.function
def model_loss():
    return -model.elbo(data)
[19]:
print(model_loss().numpy())
WARNING:tensorflow:From /home/runner/work/markovflow/markovflow/.venv/lib/python3.7/site-packages/tensorflow/python/ops/linalg/linear_operator_full_matrix.py:158: calling LinearOperator.__init__ (from tensorflow.python.ops.linalg.linear_operator) with graph_parents is deprecated and will be removed in a future version.
Instructions for updating:
Do not pass `graph_parents`.  They will  no longer be used.
1736.3434616433155
[20]:
@tf.function
def step():

    # first take step with hyper-parameters with Adam
    adam_opt.minimize(model_loss, var_list=adam_var_list)
    # then variational parameters with NatGrad
    natgrad_opt.minimize(model_loss, model.dist_q)
[21]:
@tf.function
def sample_y(X, num_samples, correlated: bool = False):
    if correlated:
        # this path may give Cholesky errors
        f_samples = model.posterior.sample_f(X, num_samples)
    else:
        f_mean, f_var = model.posterior.predict_f(X)
        f_samples = tfp.distributions.Normal(f_mean, tf.sqrt(f_var)).sample(num_samples)
    return likelihood.sample_y(f_samples)
[22]:
# the arguments to a tf.function-wrapped function need to be Tensors, not numpy arrays:
X_grid_tensor = tf.convert_to_tensor(X_grid)
[23]:
iterations = ci_niter(100)

# run the optimization
for it in range(iterations):
    if it % 10 == 0:

        plt.figure()
        y_samples = sample_y(X_grid_tensor, 1000)
        lower, upper = np.quantile(y_samples, q=[0.05, 0.95], axis=0).squeeze(-1)
        plt.plot(X_grid, lower, "b-")
        plt.plot(X_grid, upper, "b-")
        # f_samples = model.posterior.sample_f(X_grid, 10)
        # plt.plot(X_grid, f_samples[..., 0].numpy().T, 'b-')
        plt.plot(X_train, Y_train, "k.")
        # plt.savefig("test_%03d.png" % it)
        # plt.close()

    step()
    print(it, model_loss())
0 tf.Tensor(566427.6624660543, shape=(), dtype=float64)
1 tf.Tensor(145685.75092924671, shape=(), dtype=float64)
2 tf.Tensor(117069.46294072768, shape=(), dtype=float64)
3 tf.Tensor(100046.35228148963, shape=(), dtype=float64)
4 tf.Tensor(88329.46709546277, shape=(), dtype=float64)
5 tf.Tensor(79581.63134721595, shape=(), dtype=float64)
6 tf.Tensor(72700.18054205697, shape=(), dtype=float64)
7 tf.Tensor(67085.70905573157, shape=(), dtype=float64)
8 tf.Tensor(62379.80781619742, shape=(), dtype=float64)
9 tf.Tensor(58352.87891801658, shape=(), dtype=float64)
10 tf.Tensor(54849.937364569334, shape=(), dtype=float64)
11 tf.Tensor(51761.96472044088, shape=(), dtype=float64)
12 tf.Tensor(49009.67938544857, shape=(), dtype=float64)
13 tf.Tensor(46533.81755362382, shape=(), dtype=float64)
14 tf.Tensor(44289.0430416615, shape=(), dtype=float64)
15 tf.Tensor(42239.983885281865, shape=(), dtype=float64)
16 tf.Tensor(40358.56892555485, shape=(), dtype=float64)
17 tf.Tensor(38622.18797572269, shape=(), dtype=float64)
18 tf.Tensor(37012.39010245702, shape=(), dtype=float64)
19 tf.Tensor(35513.94308378339, shape=(), dtype=float64)
20 tf.Tensor(34114.14108708896, shape=(), dtype=float64)
21 tf.Tensor(32802.28655123121, shape=(), dtype=float64)
22 tf.Tensor(31569.2966400661, shape=(), dtype=float64)
23 tf.Tensor(30407.4002854263, shape=(), dtype=float64)
24 tf.Tensor(29309.902118940074, shape=(), dtype=float64)
25 tf.Tensor(28270.99647830591, shape=(), dtype=float64)
26 tf.Tensor(27285.619376546158, shape=(), dtype=float64)
27 tf.Tensor(26349.3295861843, shape=(), dtype=float64)
28 tf.Tensor(25458.212291016684, shape=(), dtype=float64)
29 tf.Tensor(24608.80040242107, shape=(), dtype=float64)
30 tf.Tensor(23798.00982870771, shape=(), dtype=float64)
31 tf.Tensor(23023.08585898673, shape=(), dtype=float64)
32 tf.Tensor(22281.558469928706, shape=(), dtype=float64)
33 tf.Tensor(21571.204848825324, shape=(), dtype=float64)
34 tf.Tensor(20890.017792703627, shape=(), dtype=float64)
35 tf.Tensor(20236.17892302155, shape=(), dtype=float64)
36 tf.Tensor(19608.035870598662, shape=(), dtype=float64)
37 tf.Tensor(19004.082752570495, shape=(), dtype=float64)
38 tf.Tensor(18422.94339367364, shape=(), dtype=float64)
39 tf.Tensor(17863.356846636118, shape=(), dtype=float64)
40 tf.Tensor(17324.164848114506, shape=(), dtype=float64)
41 tf.Tensor(16804.300911543283, shape=(), dtype=float64)
42 tf.Tensor(16302.780809984533, shape=(), dtype=float64)
43 tf.Tensor(15818.694244419685, shape=(), dtype=float64)
44 tf.Tensor(15351.197526837825, shape=(), dtype=float64)
45 tf.Tensor(14899.507135193086, shape=(), dtype=float64)
46 tf.Tensor(14462.894020149868, shape=(), dtype=float64)
47 tf.Tensor(14040.678562245359, shape=(), dtype=float64)
48 tf.Tensor(13632.226093469728, shape=(), dtype=float64)
49 tf.Tensor(13236.942910237638, shape=(), dtype=float64)
50 tf.Tensor(12854.272715343459, shape=(), dtype=float64)
51 tf.Tensor(12483.693435557652, shape=(), dtype=float64)
52 tf.Tensor(12124.714368895895, shape=(), dtype=float64)
53 tf.Tensor(11776.873622087049, shape=(), dtype=float64)
54 tf.Tensor(11439.735804047792, shape=(), dtype=float64)
55 tf.Tensor(11112.889945730858, shape=(), dtype=float64)
56 tf.Tensor(10795.947620637473, shape=(), dtype=float64)
57 tf.Tensor(10488.54124357179, shape=(), dtype=float64)
58 tf.Tensor(10190.322527973076, shape=(), dtype=float64)
59 tf.Tensor(9900.961084757546, shape=(), dtype=float64)
60 tf.Tensor(9620.143147552008, shape=(), dtype=float64)
61 tf.Tensor(9347.57041108815, shape=(), dtype=float64)
62 tf.Tensor(9082.958971082917, shape=(), dtype=float64)
63 tf.Tensor(8826.038355264915, shape=(), dtype=float64)
64 tf.Tensor(8576.550636433969, shape=(), dtype=float64)
65 tf.Tensor(8334.249619437243, shape=(), dtype=float64)
66 tf.Tensor(8098.900094859939, shape=(), dtype=float64)
67 tf.Tensor(7870.277153038106, shape=(), dtype=float64)
68 tf.Tensor(7648.165552664937, shape=(), dtype=float64)
69 tf.Tensor(7432.359138884239, shape=(), dtype=float64)
70 tf.Tensor(7222.660306324423, shape=(), dtype=float64)
71 tf.Tensor(7018.8795029542725, shape=(), dtype=float64)
72 tf.Tensor(6820.834771124197, shape=(), dtype=float64)
73 tf.Tensor(6628.351322465935, shape=(), dtype=float64)
74 tf.Tensor(6441.261143708526, shape=(), dtype=float64)
75 tf.Tensor(6259.402630724686, shape=(), dtype=float64)
76 tf.Tensor(6082.620248409402, shape=(), dtype=float64)
77 tf.Tensor(5910.764214206241, shape=(), dtype=float64)
78 tf.Tensor(5743.690203323477, shape=(), dtype=float64)
79 tf.Tensor(5581.259073857898, shape=(), dtype=float64)
80 tf.Tensor(5423.336610210406, shape=(), dtype=float64)
81 tf.Tensor(5269.7932833383475, shape=(), dtype=float64)
82 tf.Tensor(5120.5040265106845, shape=(), dtype=float64)
83 tf.Tensor(4975.348025363184, shape=(), dtype=float64)
84 tf.Tensor(4834.208521148247, shape=(), dtype=float64)
85 tf.Tensor(4696.972626187602, shape=(), dtype=float64)
86 tf.Tensor(4563.53115060602, shape=(), dtype=float64)
87 tf.Tensor(4433.778439514702, shape=(), dtype=float64)
88 tf.Tensor(4307.612219880307, shape=(), dtype=float64)
89 tf.Tensor(4184.933456379909, shape=(), dtype=float64)
90 tf.Tensor(4065.646215602913, shape=(), dtype=float64)
91 tf.Tensor(3949.6575380085596, shape=(), dtype=float64)
92 tf.Tensor(3836.8773170982026, shape=(), dtype=float64)
93 tf.Tensor(3727.2181853032125, shape=(), dtype=float64)
94 tf.Tensor(3620.595406132735, shape=(), dtype=float64)
95 tf.Tensor(3516.926772152053, shape=(), dtype=float64)
96 tf.Tensor(3416.1325084042755, shape=(), dtype=float64)
97 tf.Tensor(3318.1351809090165, shape=(), dtype=float64)
98 tf.Tensor(3222.8596099082806, shape=(), dtype=float64)
99 tf.Tensor(3130.232787541596, shape=(), dtype=float64)
../_images/notebooks_multistage_likelihood_28_1.png
../_images/notebooks_multistage_likelihood_28_2.png
../_images/notebooks_multistage_likelihood_28_3.png
../_images/notebooks_multistage_likelihood_28_4.png
../_images/notebooks_multistage_likelihood_28_5.png
../_images/notebooks_multistage_likelihood_28_6.png
../_images/notebooks_multistage_likelihood_28_7.png
../_images/notebooks_multistage_likelihood_28_8.png
../_images/notebooks_multistage_likelihood_28_9.png
../_images/notebooks_multistage_likelihood_28_10.png

Compare inferred functions with ground truth

[24]:
Fmean, Fvar = model.predict_f(X_grid)
[25]:
for i in range(num_latent):
    plt.plot(X_grid, Fmean[:, i], f"C{i}-")
    plt.fill_between(
        X_grid,
        Fmean[:, i] - 2 * tf.sqrt(Fvar[:, i]),
        Fmean[:, i] + 2 * tf.sqrt(Fvar[:, i]),
        color=f"C{i}",
        alpha=0.3,
    )

    plt.plot(X, f[:, i], f"C{i}--")

# Custom legend:
from matplotlib.patches import Patch
from matplotlib.lines import Line2D

plt.legend(
    loc="upper right",
    handles=[
        Line2D([0], [0], color="k", ls="--", label="Ground truth"),
        Line2D([0], [0], color="k", label="Inferred mean"),
        Patch(facecolor="k", alpha=0.3, label="Inferred uncertainty"),
    ],
)
[25]:
<matplotlib.legend.Legend at 0x7fbeae72d690>
../_images/notebooks_multistage_likelihood_31_1.png

The above plot shows that our linear combination of GPs with Matern kernels has done a reasonable job of representing the ground truth.