We demonstrate a MultiStageLikelihood driven by a multi-output SVGP model. We fit the model to samples from the MultiStageLikelihood given toy functions from a Gaussian process draw.
[1]:
import gpflow import numpy as np import tensorflow as tf import tensorflow_probability as tfp from gpflow import set_trainable from gpflow.ci_utils import ci_niter from matplotlib import pyplot as plt
2022-09-17 15:50:18.054886: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.13/x64/lib 2022-09-17 15:50:18.054922: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[2]:
import markovflow.kernels as mfk from markovflow.models.sparse_variational import SparseVariationalGaussianProcess as SVGP from markovflow.ssm_natgrad import SSMNaturalGradient from markovflow.likelihoods.mutlistage_likelihood import MultiStageLikelihood
We draw toy functions for the three likelihood parameters from a Gaussian process.
[3]:
N = 100 # number of training points X_train = np.arange(N).astype(float) L = 3 # number of latent functions
[4]:
# Define the kernel k1a = gpflow.kernels.Periodic( gpflow.kernels.Matern52(variance=1.0, lengthscales=3.0), period=12.0 ) k1b = gpflow.kernels.Matern52(variance=1.0, lengthscales=30.0) k2 = gpflow.kernels.Matern32(variance=0.1, lengthscales=5.0) k = k1a * k1b + k2 # Draw three independent functions from the same Gaussian process X = X_train num_latent = L K = k(X[:, None]) np.random.seed(123) v = np.random.randn(len(K), num_latent) # We draw samples from a GP with kernel k(.) evaluated at X by reparameterizing: # f ~ N(0, K) → f = chol(K) v, v ~ N(0, I), where chol(K) chol(K)ᵀ = K f = np.linalg.cholesky(K + 1e-6 * np.eye(len(K))) @ v # We shift the third function to increase the mean of the Poisson component to 20 to make it easier to identify f += np.array([0.0, 0.0, np.log(20)]).reshape(1, L) # Plot all three functions plt.figure() for i in range(num_latent): plt.plot(X, f[:, i]) _ = plt.xticks(np.arange(0, 100, 12))
2022-09-17 15:50:19.818926: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-09-17 15:50:19.819119: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.13/x64/lib 2022-09-17 15:50:19.819130: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) 2022-09-17 15:50:19.819148: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az178-774): /proc/driver/nvidia/version does not exist 2022-09-17 15:50:19.819406: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-09-17 15:50:19.819519: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
The above latent GPs represent how the three likelihood parameters will change over time.
[5]:
# Define the likelihood lik = MultiStageLikelihood() # Draw observations from the likelihood given the functions `f` from the previous step Y = lik.sample_y(tf.convert_to_tensor(f, dtype=gpflow.default_float()))
2022-09-17 15:50:20.124439: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2022-09-17 15:50:20.126792: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2793435000 Hz
[6]:
# Plot the observations _ = plt.plot(X, Y, ".")
[7]:
Y_train = Y data = (X_train, Y_train)
(Note: as we are replicating the modelling task, we pretend we don’t know the underlying processes that created the artificial data.)
We decide to create 3 GPs each with an independent Matern kernel, as we need three functions to drive the three parameters of the likelihood. This will correspond to learning 6 hyperparameters (3 GPs with 2 hyper-parameters each).
[8]:
# Create kernels kern_list = [mfk.Matern32(variance=1.0, lengthscale=10.0, jitter=1e-6) for _ in range(L)]
[9]:
# Create multi-output kernel from kernel list ker = mfk.IndependentMultiOutput(kern_list)
[10]:
# Create evenly spaced inducing points num_inducing = N // 10 Z = np.linspace(X_train.min(), X_train.max(), num_inducing)
[11]:
# create multi-output inducing variables from Z inducing_variable = tf.constant(Z)
[12]:
likelihood = MultiStageLikelihood()
[13]:
model = SVGP( kernel=ker, likelihood=likelihood, inducing_points=inducing_variable, mean_function=None, )
WARNING:tensorflow:From /home/runner/work/markovflow/markovflow/.venv/lib/python3.7/site-packages/tensorflow/python/ops/linalg/linear_operator_block_diag.py:223: LinearOperator.graph_parents (from tensorflow.python.ops.linalg.linear_operator) is deprecated and will be removed in a future version. Instructions for updating: Do not call `graph_parents`.
[14]:
X_grid = X
NatGrads and Adam for SVGP
[15]:
adam_learning_rate = 0.001 natgrad_learning_rate = 0.05
[16]:
adam_opt = tf.optimizers.Adam(learning_rate=adam_learning_rate) natgrad_opt = SSMNaturalGradient(gamma=natgrad_learning_rate, momentum=False) # Stop Adam from optimizing the variational parameters set_trainable(model.dist_q, False) adam_var_list = model.trainable_variables set_trainable(model.dist_q, True)
2022-09-17 15:50:20.415025: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
We separate the training process into hyper-parameters and variational parameters, which in practice, can result in better training.
For the variational parameters we can use much larger step sizes using natural gradients (whereas the hyper-parameters take smaller steps using Adam).
This results in optimisation being more efficient (i.e. faster and better results).
[17]:
# Variables optimized by the Adam optimizer: print(adam_var_list)
(<tf.Variable 'lengthscale:0' shape=() dtype=float64, numpy=9.99995459903963>, <tf.Variable 'variance:0' shape=() dtype=float64, numpy=0.5413248546129181>, <tf.Variable 'lengthscale:0' shape=() dtype=float64, numpy=9.99995459903963>, <tf.Variable 'variance:0' shape=() dtype=float64, numpy=0.5413248546129181>, <tf.Variable 'lengthscale:0' shape=() dtype=float64, numpy=9.99995459903963>, <tf.Variable 'variance:0' shape=() dtype=float64, numpy=0.5413248546129181>)
[18]:
@tf.function def model_loss(): return -model.elbo(data)
[19]:
print(model_loss().numpy())
WARNING:tensorflow:From /home/runner/work/markovflow/markovflow/.venv/lib/python3.7/site-packages/tensorflow/python/ops/linalg/linear_operator_full_matrix.py:158: calling LinearOperator.__init__ (from tensorflow.python.ops.linalg.linear_operator) with graph_parents is deprecated and will be removed in a future version. Instructions for updating: Do not pass `graph_parents`. They will no longer be used. 1736.3434616433155
[20]:
@tf.function def step(): # first take step with hyper-parameters with Adam adam_opt.minimize(model_loss, var_list=adam_var_list) # then variational parameters with NatGrad natgrad_opt.minimize(model_loss, model.dist_q)
[21]:
@tf.function def sample_y(X, num_samples, correlated: bool = False): if correlated: # this path may give Cholesky errors f_samples = model.posterior.sample_f(X, num_samples) else: f_mean, f_var = model.posterior.predict_f(X) f_samples = tfp.distributions.Normal(f_mean, tf.sqrt(f_var)).sample(num_samples) return likelihood.sample_y(f_samples)
[22]:
# the arguments to a tf.function-wrapped function need to be Tensors, not numpy arrays: X_grid_tensor = tf.convert_to_tensor(X_grid)
[23]:
iterations = ci_niter(100) # run the optimization for it in range(iterations): if it % 10 == 0: plt.figure() y_samples = sample_y(X_grid_tensor, 1000) lower, upper = np.quantile(y_samples, q=[0.05, 0.95], axis=0).squeeze(-1) plt.plot(X_grid, lower, "b-") plt.plot(X_grid, upper, "b-") # f_samples = model.posterior.sample_f(X_grid, 10) # plt.plot(X_grid, f_samples[..., 0].numpy().T, 'b-') plt.plot(X_train, Y_train, "k.") # plt.savefig("test_%03d.png" % it) # plt.close() step() print(it, model_loss())
0 tf.Tensor(566427.6624660543, shape=(), dtype=float64) 1 tf.Tensor(145685.75092924671, shape=(), dtype=float64) 2 tf.Tensor(117069.46294072768, shape=(), dtype=float64) 3 tf.Tensor(100046.35228148963, shape=(), dtype=float64) 4 tf.Tensor(88329.46709546277, shape=(), dtype=float64) 5 tf.Tensor(79581.63134721595, shape=(), dtype=float64) 6 tf.Tensor(72700.18054205697, shape=(), dtype=float64) 7 tf.Tensor(67085.70905573157, shape=(), dtype=float64) 8 tf.Tensor(62379.80781619742, shape=(), dtype=float64) 9 tf.Tensor(58352.87891801658, shape=(), dtype=float64) 10 tf.Tensor(54849.937364569334, shape=(), dtype=float64) 11 tf.Tensor(51761.96472044088, shape=(), dtype=float64) 12 tf.Tensor(49009.67938544857, shape=(), dtype=float64) 13 tf.Tensor(46533.81755362382, shape=(), dtype=float64) 14 tf.Tensor(44289.0430416615, shape=(), dtype=float64) 15 tf.Tensor(42239.983885281865, shape=(), dtype=float64) 16 tf.Tensor(40358.56892555485, shape=(), dtype=float64) 17 tf.Tensor(38622.18797572269, shape=(), dtype=float64) 18 tf.Tensor(37012.39010245702, shape=(), dtype=float64) 19 tf.Tensor(35513.94308378339, shape=(), dtype=float64) 20 tf.Tensor(34114.14108708896, shape=(), dtype=float64) 21 tf.Tensor(32802.28655123121, shape=(), dtype=float64) 22 tf.Tensor(31569.2966400661, shape=(), dtype=float64) 23 tf.Tensor(30407.4002854263, shape=(), dtype=float64) 24 tf.Tensor(29309.902118940074, shape=(), dtype=float64) 25 tf.Tensor(28270.99647830591, shape=(), dtype=float64) 26 tf.Tensor(27285.619376546158, shape=(), dtype=float64) 27 tf.Tensor(26349.3295861843, shape=(), dtype=float64) 28 tf.Tensor(25458.212291016684, shape=(), dtype=float64) 29 tf.Tensor(24608.80040242107, shape=(), dtype=float64) 30 tf.Tensor(23798.00982870771, shape=(), dtype=float64) 31 tf.Tensor(23023.08585898673, shape=(), dtype=float64) 32 tf.Tensor(22281.558469928706, shape=(), dtype=float64) 33 tf.Tensor(21571.204848825324, shape=(), dtype=float64) 34 tf.Tensor(20890.017792703627, shape=(), dtype=float64) 35 tf.Tensor(20236.17892302155, shape=(), dtype=float64) 36 tf.Tensor(19608.035870598662, shape=(), dtype=float64) 37 tf.Tensor(19004.082752570495, shape=(), dtype=float64) 38 tf.Tensor(18422.94339367364, shape=(), dtype=float64) 39 tf.Tensor(17863.356846636118, shape=(), dtype=float64) 40 tf.Tensor(17324.164848114506, shape=(), dtype=float64) 41 tf.Tensor(16804.300911543283, shape=(), dtype=float64) 42 tf.Tensor(16302.780809984533, shape=(), dtype=float64) 43 tf.Tensor(15818.694244419685, shape=(), dtype=float64) 44 tf.Tensor(15351.197526837825, shape=(), dtype=float64) 45 tf.Tensor(14899.507135193086, shape=(), dtype=float64) 46 tf.Tensor(14462.894020149868, shape=(), dtype=float64) 47 tf.Tensor(14040.678562245359, shape=(), dtype=float64) 48 tf.Tensor(13632.226093469728, shape=(), dtype=float64) 49 tf.Tensor(13236.942910237638, shape=(), dtype=float64) 50 tf.Tensor(12854.272715343459, shape=(), dtype=float64) 51 tf.Tensor(12483.693435557652, shape=(), dtype=float64) 52 tf.Tensor(12124.714368895895, shape=(), dtype=float64) 53 tf.Tensor(11776.873622087049, shape=(), dtype=float64) 54 tf.Tensor(11439.735804047792, shape=(), dtype=float64) 55 tf.Tensor(11112.889945730858, shape=(), dtype=float64) 56 tf.Tensor(10795.947620637473, shape=(), dtype=float64) 57 tf.Tensor(10488.54124357179, shape=(), dtype=float64) 58 tf.Tensor(10190.322527973076, shape=(), dtype=float64) 59 tf.Tensor(9900.961084757546, shape=(), dtype=float64) 60 tf.Tensor(9620.143147552008, shape=(), dtype=float64) 61 tf.Tensor(9347.57041108815, shape=(), dtype=float64) 62 tf.Tensor(9082.958971082917, shape=(), dtype=float64) 63 tf.Tensor(8826.038355264915, shape=(), dtype=float64) 64 tf.Tensor(8576.550636433969, shape=(), dtype=float64) 65 tf.Tensor(8334.249619437243, shape=(), dtype=float64) 66 tf.Tensor(8098.900094859939, shape=(), dtype=float64) 67 tf.Tensor(7870.277153038106, shape=(), dtype=float64) 68 tf.Tensor(7648.165552664937, shape=(), dtype=float64) 69 tf.Tensor(7432.359138884239, shape=(), dtype=float64) 70 tf.Tensor(7222.660306324423, shape=(), dtype=float64) 71 tf.Tensor(7018.8795029542725, shape=(), dtype=float64) 72 tf.Tensor(6820.834771124197, shape=(), dtype=float64) 73 tf.Tensor(6628.351322465935, shape=(), dtype=float64) 74 tf.Tensor(6441.261143708526, shape=(), dtype=float64) 75 tf.Tensor(6259.402630724686, shape=(), dtype=float64) 76 tf.Tensor(6082.620248409402, shape=(), dtype=float64) 77 tf.Tensor(5910.764214206241, shape=(), dtype=float64) 78 tf.Tensor(5743.690203323477, shape=(), dtype=float64) 79 tf.Tensor(5581.259073857898, shape=(), dtype=float64) 80 tf.Tensor(5423.336610210406, shape=(), dtype=float64) 81 tf.Tensor(5269.7932833383475, shape=(), dtype=float64) 82 tf.Tensor(5120.5040265106845, shape=(), dtype=float64) 83 tf.Tensor(4975.348025363184, shape=(), dtype=float64) 84 tf.Tensor(4834.208521148247, shape=(), dtype=float64) 85 tf.Tensor(4696.972626187602, shape=(), dtype=float64) 86 tf.Tensor(4563.53115060602, shape=(), dtype=float64) 87 tf.Tensor(4433.778439514702, shape=(), dtype=float64) 88 tf.Tensor(4307.612219880307, shape=(), dtype=float64) 89 tf.Tensor(4184.933456379909, shape=(), dtype=float64) 90 tf.Tensor(4065.646215602913, shape=(), dtype=float64) 91 tf.Tensor(3949.6575380085596, shape=(), dtype=float64) 92 tf.Tensor(3836.8773170982026, shape=(), dtype=float64) 93 tf.Tensor(3727.2181853032125, shape=(), dtype=float64) 94 tf.Tensor(3620.595406132735, shape=(), dtype=float64) 95 tf.Tensor(3516.926772152053, shape=(), dtype=float64) 96 tf.Tensor(3416.1325084042755, shape=(), dtype=float64) 97 tf.Tensor(3318.1351809090165, shape=(), dtype=float64) 98 tf.Tensor(3222.8596099082806, shape=(), dtype=float64) 99 tf.Tensor(3130.232787541596, shape=(), dtype=float64)
[24]:
Fmean, Fvar = model.predict_f(X_grid)
[25]:
for i in range(num_latent): plt.plot(X_grid, Fmean[:, i], f"C{i}-") plt.fill_between( X_grid, Fmean[:, i] - 2 * tf.sqrt(Fvar[:, i]), Fmean[:, i] + 2 * tf.sqrt(Fvar[:, i]), color=f"C{i}", alpha=0.3, ) plt.plot(X, f[:, i], f"C{i}--") # Custom legend: from matplotlib.patches import Patch from matplotlib.lines import Line2D plt.legend( loc="upper right", handles=[ Line2D([0], [0], color="k", ls="--", label="Ground truth"), Line2D([0], [0], color="k", label="Inferred mean"), Patch(facecolor="k", alpha=0.3, label="Inferred uncertainty"), ], )
<matplotlib.legend.Legend at 0x7fbeae72d690>
The above plot shows that our linear combination of GPs with Matern kernels has done a reasonable job of representing the ground truth.