Using deep Gaussian processes with GPflux for Bayesian optimization.#

[1]:
import numpy as np
import tensorflow as tf

np.random.seed(1794)
tf.random.set_seed(1794)

Describe the problem#

In this notebook, we show how to use deep Gaussian processes (DGPs) for Bayesian optimization using Trieste and GPflux. DGPs may be better for modeling non-stationary objective functions than standard GP surrogates, as discussed in [DKvdH+17, HBB+19].

In this example, we look to find the minimum value of the two- and five-dimensional Michalewicz functions over the hypercubes \([0, pi]^2\)/\([0, pi]^5\). We compare a two-layer DGP model with GPR, using Thompson sampling for both.

The Michalewicz functions are highly non-stationary and have a global minimum that’s hard to find, so DGPs might be more suitable than standard GPs, which may struggle because they typically have stationary kernels that cannot easily model non-stationarities.

[2]:
import gpflow
from trieste.objectives import Michalewicz2, Michalewicz5
from trieste.objectives.utils import mk_observer
from trieste.experimental.plotting import plot_function_plotly

function = Michalewicz2.objective
F_MINIMIZER = Michalewicz2.minimum

search_space = Michalewicz2.search_space

fig = plot_function_plotly(function, search_space.lower, search_space.upper)
fig.show()