PFNs4BO: In-Context Learning for Bayesian Optimization
Authors: Samuel Müller, Matthias Feurer, Noah Hollmann, Frank Hutter
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. |
| Researcher Affiliation | Collaboration | Samuel M uller 1 2 Matthias Feurer 1 Noah Hollmann 1 2 3 Frank Hutter 1 2 1University of Freiburg, Germany 2Prior Labs 3Charit e Berlin University of Medicine, Germany. |
| Pseudocode | Yes | Algorithm 1 Bayesian optimization with GPs or PFNs |
| Open Source Code | Yes | We publish code alongside trained models at github.com/automl/PFNs4BO. |
| Open Datasets | Yes | We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. ... HPO-B (Pineda Arango et al., 2021), ... Bayesmark1 https://github.com/uber/bayesmark, ... PD1 (Wang et al., 2021) |
| Dataset Splits | Yes | To determine our global prior hyperparameters, we split off a set of 7 validation search spaces from our largest benchmark, HPO-B, as validation search spaces. ... As validation search spaces, we use HPO-B IDs: 5527, 5891, 5906, 5971, 6767, 6766, and 5860. |
| Hardware Specification | Yes | Our final models, besides studies on smaller budgets like in Figure 11, trained for less than 24 hours on a cluster node with eight RTX 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions software like GPyTorch and scipy, but does not specify their version numbers or other crucial software dependencies with specific versions required for reproducibility. |
| Experiment Setup | Yes | Our PFNs were trained with the standard PFN settings used by M uller et al. (2022): We use an embedding size of 512 and six layers in the transformer. Our models were trained with Adam (Kingma & Ba, 2015) and cosine-annealing (Loshchilov & Hutter, 2017) without any special tricks. The lr was chosen based on simple grid searches for minimal training loss in {1e 3, 3e 4, 1e 4, 5e 5}. |