Benchmarks#
We benchmark GPflux’ Deep GP on several UCI datasets. The code to run the experiments can be found in benchmarking/main.py
. The results are stored in benchmarking/runs/*.json
. In this script we aggregate and plot the outcomes.
We report the mean and std. dev. of the MSE and Negative Log Predictive Density (NLPD) measured by running the experiment on 5 different splits. We use 90% of the data for training and the remaining 10% for testing. The output is normalised to have zero mean and unit variance.
[4]:
table
[4]:
split | mse | nlpd | ||||
---|---|---|---|---|---|---|
count | mean | std | mean | std | ||
dataset | model | |||||
Concrete | dgp-1 | 5 | 0.103785 | 0.014586 | 0.526873 | 0.231547 |
dgp-2 | 5 | 0.093612 | 0.003917 | 0.388471 | 0.200387 | |
dgp-3 | 5 | 0.103213 | 0.019258 | 0.624335 | 0.409077 | |
Energy | dgp-1 | 5 | 0.003866 | 0.001660 | -0.991852 | 0.065885 |
dgp-2 | 5 | 0.004071 | 0.001542 | -1.089672 | 0.039099 | |
dgp-3 | 5 | 0.004063 | 0.001521 | -1.091651 | 0.039407 | |
Kin8mn | dgp-1 | 5 | 0.098581 | 0.006733 | 0.263775 | 0.019575 |
dgp-2 | 5 | 0.061714 | 0.002321 | 0.040491 | 0.026879 | |
dgp-3 | 5 | 0.064156 | 0.002981 | 0.144311 | 0.045383 | |
Power | dgp-1 | 5 | 0.056407 | 0.004272 | -0.009102 | 0.045228 |
dgp-2 | 5 | 0.044380 | 0.006752 | -0.129386 | 0.078303 | |
dgp-3 | 5 | 0.042464 | 0.005769 | -0.113741 | 0.040804 | |
Yacht | dgp-1 | 5 | 0.005899 | 0.005309 | -0.908563 | 0.095456 |
dgp-2 | 5 | 0.002389 | 0.002963 | -1.084093 | 0.071270 | |
dgp-3 | 5 | 0.002420 | 0.002879 | -1.085658 | 0.069810 |