Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pre-trained Gaussian Processes for Bayesian Optimization

Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that on average, Hyper BO is able to locate good hyperparameters at least 3 times more efﬁciently than the best competing methods on both our new tuning dataset and existing multi-task BO benchmarks. Keywords: Bayesian optimization, Gaussian processes, pre-trained models, transfer learning, hyperparameter tuning
Researcher Affiliation	Industry	Zi Wang EMAIL George E. Dahl EMAIL Kevin Swersky EMAIL Chansoo Lee EMAIL Zachary Nado EMAIL Justin Gilmer EMAIL Jasper Snoek EMAIL Zoubin Ghahramani EMAIL Google Deep Mind
Pseudocode	Yes	Algorithm 1 Hyper BO for optimizing unknown function f. 1: function HYPERBO (f, DN) 2: GP(ˆµ, ˆk ˆσ2) PRE-TRAIN(DN) Pre-train a GP on training dataset DN ( 5). 4: for t = 1, , T do 5: xt arg max x X α x; GP(ˆµ, ˆk ˆσ2 \| Df) Optimize the acquisition function α( ). 6: yt OBSERVE(f(xt)) Collect noisy output of function f at input xt. 7: Df Df {(xt, yt)} 9: return Df 10: end function
Open Source Code	Yes	Together with our open-sourced code for Hyper BO, the released dataset ensures the reproducibility of our work1. More importantly, the dataset provides a realistic benchmark for multi-task BO, with open opportunities to explore detailed metrics for each training step and other auxiliary information. 1. Both open-sourced code and dataset are available at https://github.com/google-research/hyperbo. Our JAX-based (Bradbury et al., 2018) implementation of Hyper BO can be found at https://github.com/google-research/hyperbo, which was used for all of our experiments. To accommodate needs for more modular use cases, we also provide a Flax (Heek et al., 2020) and Tensor Flow-Probability (Dillon et al., 2017) based implementation for GP pre-training at https://github.com/google-research/gpax.
Open Datasets	Yes	5. We open-sourced the ﬁrst large multi-task hyperparameter tuning dataset for modern deep learning models... Together with our open-sourced code for Hyper BO, the released dataset ensures the reproducibility of our work1. More importantly, the dataset provides a realistic benchmark for multi-task BO, with open opportunities to explore detailed metrics for each training step and other auxiliary information. 1. Both open-sourced code and dataset are available at https://github.com/google-research/hyperbo. Please download the dataset (http://storage.googleapis.com/gresearch/pint/pd1.tar.gz) and see its descriptions for additional details about the tasks and training procedure.
Dataset Splits	Yes	For each test task, we used subsets of the other 23 tasks (including Image Net Res Net50 1024) to compose training datasets. The HPO-B benchmark is a machine learning hyperparameter tuning dataset, which includes about 6 million evaluations of hyperparameters from 16 search spaces of different models. Each search space has different sets of hyperparameters with dimensions ranging from 2 to 18. There are multiple tasks in each search spaces, which are divided to training and test tasks. In total, there are 86 test tasks.
Hardware Specification	Yes	The dataset used roughly 12,000 machine-days of computation on TPUv4i (Jouppi et al., 2021) for approximately 50,000 hyperparameter evaluations.
Software Dependencies	Yes	Our JAX-based (Bradbury et al., 2018) implementation of Hyper BO can be found at https://github.com/google-research/hyperbo, which was used for all of our experiments. To accommodate needs for more modular use cases, we also provide a Flax (Heek et al., 2020) and Tensor Flow-Probability (Dillon et al., 2017) based implementation for GP pre-training at https://github.com/google-research/gpax. The NLL objective was optimized with the Adam optimizer (Kingma and Ba, 2015) implemented in Optax (Babuschkin et al., 2020) with 10 3 learning rate, 50,000 training steps and 50 batch size as recommended by Wistuba and Grabocka (2021).
Experiment Setup	Yes	We used a 2-hidden-layer neural network of size (32, 32) as mean function and an anisotropic Matérn52 covariance on the last feature layer of the mean function as kernel. We used tanh activation for the neural network. The NLL objective was optimized with the Adam optimizer (Kingma and Ba, 2015) implemented in Optax (Babuschkin et al., 2020) with 10 3 learning rate, 50,000 training steps and 50 batch size as recommended by Wistuba and Grabocka (2021).