Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Pre-trained Gaussian Processes for Bayesian Optimization
Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that on average, Hyper BO is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks. Keywords: Bayesian optimization, Gaussian processes, pre-trained models, transfer learning, hyperparameter tuning |
| Researcher Affiliation | Industry | Zi Wang EMAIL George E. Dahl EMAIL Kevin Swersky EMAIL Chansoo Lee EMAIL Zachary Nado EMAIL Justin Gilmer EMAIL Jasper Snoek EMAIL Zoubin Ghahramani EMAIL Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 Hyper BO for optimizing unknown function f. 1: function HYPERBO (f, DN) 2: GP(ˆµ, ˆk ˆσ2) PRE-TRAIN(DN) Pre-train a GP on training dataset DN ( 5). 4: for t = 1, , T do 5: xt arg max x X α x; GP(ˆµ, ˆk ˆσ2 | Df) Optimize the acquisition function α( ). 6: yt OBSERVE(f(xt)) Collect noisy output of function f at input xt. 7: Df Df {(xt, yt)} 9: return Df 10: end function |
| Open Source Code | Yes | Together with our open-sourced code for Hyper BO, the released dataset ensures the reproducibility of our work1. More importantly, the dataset provides a realistic benchmark for multi-task BO, with open opportunities to explore detailed metrics for each training step and other auxiliary information. 1. Both open-sourced code and dataset are available at https://github.com/google-research/hyperbo. Our JAX-based (Bradbury et al., 2018) implementation of Hyper BO can be found at https://github.com/google-research/hyperbo, which was used for all of our experiments. To accommodate needs for more modular use cases, we also provide a Flax (Heek et al., 2020) and Tensor Flow-Probability (Dillon et al., 2017) based implementation for GP pre-training at https://github.com/google-research/gpax. |
| Open Datasets | Yes | 5. We open-sourced the first large multi-task hyperparameter tuning dataset for modern deep learning models... Together with our open-sourced code for Hyper BO, the released dataset ensures the reproducibility of our work1. More importantly, the dataset provides a realistic benchmark for multi-task BO, with open opportunities to explore detailed metrics for each training step and other auxiliary information. 1. Both open-sourced code and dataset are available at https://github.com/google-research/hyperbo. Please download the dataset (http://storage.googleapis.com/gresearch/pint/pd1.tar.gz) and see its descriptions for additional details about the tasks and training procedure. |
| Dataset Splits | Yes | For each test task, we used subsets of the other 23 tasks (including Image Net Res Net50 1024) to compose training datasets. The HPO-B benchmark is a machine learning hyperparameter tuning dataset, which includes about 6 million evaluations of hyperparameters from 16 search spaces of different models. Each search space has different sets of hyperparameters with dimensions ranging from 2 to 18. There are multiple tasks in each search spaces, which are divided to training and test tasks. In total, there are 86 test tasks. |
| Hardware Specification | Yes | The dataset used roughly 12,000 machine-days of computation on TPUv4i (Jouppi et al., 2021) for approximately 50,000 hyperparameter evaluations. |
| Software Dependencies | Yes | Our JAX-based (Bradbury et al., 2018) implementation of Hyper BO can be found at https://github.com/google-research/hyperbo, which was used for all of our experiments. To accommodate needs for more modular use cases, we also provide a Flax (Heek et al., 2020) and Tensor Flow-Probability (Dillon et al., 2017) based implementation for GP pre-training at https://github.com/google-research/gpax. The NLL objective was optimized with the Adam optimizer (Kingma and Ba, 2015) implemented in Optax (Babuschkin et al., 2020) with 10 3 learning rate, 50,000 training steps and 50 batch size as recommended by Wistuba and Grabocka (2021). |
| Experiment Setup | Yes | We used a 2-hidden-layer neural network of size (32, 32) as mean function and an anisotropic Matérn52 covariance on the last feature layer of the mean function as kernel. We used tanh activation for the neural network. The NLL objective was optimized with the Adam optimizer (Kingma and Ba, 2015) implemented in Optax (Babuschkin et al., 2020) with 10 3 learning rate, 50,000 training steps and 50 batch size as recommended by Wistuba and Grabocka (2021). |