Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models
Authors: Rui Li, S. T. John, Arno Solin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare VI, EP, Laplace approximation, and our proposed training procedure and empirically demonstrate the effectiveness of our proposal across a wide range of data sets. |
| Researcher Affiliation | Academia | Rui Li 1 ST John 1 Arno Solin 1 1Department of Computer Science, Aalto University, Finland, and Finnish Center for Artificial Intelligence (FCAI). Correspondence to: Rui Li <rui.li@aalto.fi>. |
| Pseudocode | Yes | Algorithm 1 Training procedure for improved hyperparameter learning by a VEM-style iteration. |
| Open Source Code | Yes | We provide a reference implementation of the methods and code to reproduce the experiments at https://github.com/AaltoML/improved-hyperparameter-learning. |
| Open Datasets | Yes | We consider binary classification with a Bernoulli likelihood on 27 data sets from the UCI repository (Dua & Graff, 2017). [...] We use the Bayesian Benchmarks suite (github.com/secondmind-labs/bayesian_benchmarks) for evaluating the methods. |
| Dataset Splits | Yes | We conduct 5-fold cross-validation and use test set accuracy and log predictive density to evaluate the test performance of each method (higher is better in both). To reduce the variance introduced by the training test set split, we repeat the 5-fold CV with ten different seeds. |
| Hardware Specification | Yes | All experiments ran on a cluster, which allowed us to parallelize jobs. This played a central role especially for the MCMC baseline results for the marginal likelihood surfaces, where we split into 441 separate jobs (per hyperparameter value combination), each of which were allocated 1 3 CPU cores and 1 Gb memory and ran 8 40 h depending on data set size. |
| Software Dependencies | No | The paper mentions software like 'GPflow (Matthews et al., 2017)', 'GPy (GPy, since 2012)', and 'GPML toolbox (Rasmussen & Nickisch, 2010)'. While it references the sources of these tools, it does not provide specific version numbers for the software dependencies themselves, such as 'GPflow 1.9' or 'GPy 1.10'. |
| Experiment Setup | Yes | For LA and EP, the hyperparameters are optimized by the default optimizer L-BFGS-B in GPy. For VI and our hybrid training procedure, each E step and M step consists of 20 iterations. In the E-step we set the learning rate of natural gradient descent to be 0.1. In the M-step we use the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.01. We use the convergence criterion described in the main text, with a maximum number of at most 10 000 steps. |