reproducibilityindex.ai

PFNs4BO: In-Context Learning for Bayesian Optimization

Authors: Samuel Müller, Matthias Feurer, Noah Hollmann, Frank Hutter

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1.
Researcher Affiliation	Collaboration	Samuel M uller 1 2 Matthias Feurer 1 Noah Hollmann 1 2 3 Frank Hutter 1 2 1University of Freiburg, Germany 2Prior Labs 3Charit e Berlin University of Medicine, Germany.
Pseudocode	Yes	Algorithm 1 Bayesian optimization with GPs or PFNs
Open Source Code	Yes	We publish code alongside trained models at github.com/automl/PFNs4BO.
Open Datasets	Yes	We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. ... HPO-B (Pineda Arango et al., 2021), ... Bayesmark1 https://github.com/uber/bayesmark, ... PD1 (Wang et al., 2021)
Dataset Splits	Yes	To determine our global prior hyperparameters, we split off a set of 7 validation search spaces from our largest benchmark, HPO-B, as validation search spaces. ... As validation search spaces, we use HPO-B IDs: 5527, 5891, 5906, 5971, 6767, 6766, and 5860.
Hardware Specification	Yes	Our final models, besides studies on smaller budgets like in Figure 11, trained for less than 24 hours on a cluster node with eight RTX 2080 Ti GPUs.
Software Dependencies	No	The paper mentions software like GPyTorch and scipy, but does not specify their version numbers or other crucial software dependencies with specific versions required for reproducibility.
Experiment Setup	Yes	Our PFNs were trained with the standard PFN settings used by M uller et al. (2022): We use an embedding size of 512 and six layers in the transformer. Our models were trained with Adam (Kingma & Ba, 2015) and cosine-annealing (Loshchilov & Hutter, 2017) without any special tricks. The lr was chosen based on simple grid searches for minimal training loss in {1e 3, 3e 4, 1e 4, 5e 5}.