Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models

Authors: Rui Li, S. T. John, Arno Solin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare VI, EP, Laplace approximation, and our proposed training procedure and empirically demonstrate the effectiveness of our proposal across a wide range of data sets.
Researcher Affiliation Academia Rui Li 1 ST John 1 Arno Solin 1 1Department of Computer Science, Aalto University, Finland, and Finnish Center for Artificial Intelligence (FCAI). Correspondence to: Rui Li <rui.li@aalto.fi>.
Pseudocode Yes Algorithm 1 Training procedure for improved hyperparameter learning by a VEM-style iteration.
Open Source Code Yes We provide a reference implementation of the methods and code to reproduce the experiments at https://github.com/AaltoML/improved-hyperparameter-learning.
Open Datasets Yes We consider binary classification with a Bernoulli likelihood on 27 data sets from the UCI repository (Dua & Graff, 2017). [...] We use the Bayesian Benchmarks suite (github.com/secondmind-labs/bayesian_benchmarks) for evaluating the methods.
Dataset Splits Yes We conduct 5-fold cross-validation and use test set accuracy and log predictive density to evaluate the test performance of each method (higher is better in both). To reduce the variance introduced by the training test set split, we repeat the 5-fold CV with ten different seeds.
Hardware Specification Yes All experiments ran on a cluster, which allowed us to parallelize jobs. This played a central role especially for the MCMC baseline results for the marginal likelihood surfaces, where we split into 441 separate jobs (per hyperparameter value combination), each of which were allocated 1 3 CPU cores and 1 Gb memory and ran 8 40 h depending on data set size.
Software Dependencies No The paper mentions software like 'GPflow (Matthews et al., 2017)', 'GPy (GPy, since 2012)', and 'GPML toolbox (Rasmussen & Nickisch, 2010)'. While it references the sources of these tools, it does not provide specific version numbers for the software dependencies themselves, such as 'GPflow 1.9' or 'GPy 1.10'.
Experiment Setup Yes For LA and EP, the hyperparameters are optimized by the default optimizer L-BFGS-B in GPy. For VI and our hybrid training procedure, each E step and M step consists of 20 iterations. In the E-step we set the learning rate of natural gradient descent to be 0.1. In the M-step we use the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.01. We use the convergence criterion described in the main text, with a maximum number of at most 10 000 steps.