PFNs4BO: In-Context Learning for Bayesian Optimization

Authors: Samuel Müller, Matthias Feurer, Noah Hollmann, Frank Hutter

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1.
Researcher Affiliation Collaboration Samuel M uller 1 2 Matthias Feurer 1 Noah Hollmann 1 2 3 Frank Hutter 1 2 1University of Freiburg, Germany 2Prior Labs 3Charit e Berlin University of Medicine, Germany.
Pseudocode Yes Algorithm 1 Bayesian optimization with GPs or PFNs
Open Source Code Yes We publish code alongside trained models at github.com/automl/PFNs4BO.
Open Datasets Yes We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. ... HPO-B (Pineda Arango et al., 2021), ... Bayesmark1 https://github.com/uber/bayesmark, ... PD1 (Wang et al., 2021)
Dataset Splits Yes To determine our global prior hyperparameters, we split off a set of 7 validation search spaces from our largest benchmark, HPO-B, as validation search spaces. ... As validation search spaces, we use HPO-B IDs: 5527, 5891, 5906, 5971, 6767, 6766, and 5860.
Hardware Specification Yes Our final models, besides studies on smaller budgets like in Figure 11, trained for less than 24 hours on a cluster node with eight RTX 2080 Ti GPUs.
Software Dependencies No The paper mentions software like GPyTorch and scipy, but does not specify their version numbers or other crucial software dependencies with specific versions required for reproducibility.
Experiment Setup Yes Our PFNs were trained with the standard PFN settings used by M uller et al. (2022): We use an embedding size of 512 and six layers in the transformer. Our models were trained with Adam (Kingma & Ba, 2015) and cosine-annealing (Loshchilov & Hutter, 2017) without any special tricks. The lr was chosen based on simple grid searches for minimal training loss in {1e 3, 3e 4, 1e 4, 5e 5}.