reproducibilityindex.ai

In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

Authors: Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, Samir Garibov, Eddie Bergman, Frank Hutter

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical analysis across three benchmark suites shows that the predictions made by FT-PFN are more accurate and 10-100 times faster than those of the deep Gaussian process and deep ensemble surrogates used in previous work.
Researcher Affiliation	Academia	1Machine Learning Lab, University of Freiburg, Germany 2ELLIS Institute Tübingen.
Pseudocode	Yes	Algorithm 1 Freeze-thaw Bayesian Optimization.
Open Source Code	Yes	The code for the surrogate PFN training and reproducing experiments from this paper, is available at: https:// github.com/automl/if BO.
Open Datasets	Yes	We conduct our experiments on three benchmarks: LCBench (Zimmer et al., 2021), PD1 (Wang et al., 2021), and Taskset (Metz et al., 2020).
Dataset Splits	Yes	A single meta-training example in our setting corresponds to a training set Dtrain and test set Dtest, where Dtrain = Sλ Λ (λ, b bmax ), πcurve(λ, b bmax ) bλ b=1 corresponds to the (synthetic) partial learning curves observed thus far (i.e., the analog of H at test time) and Dtest Sλ Λ{((λ, b bmax ), πcurve(λ, b bmax ))}bmax b=bλ the extrapolation targets we want FT-PFN to predict. To keep the input size of FT-PFN fixed we choose \|Dtrain\| + \|Dtest\| = N = 1, 000 and vary the size of \|Dtrain\| U(0, N − 1).
Hardware Specification	Yes	The evaluation was run on a single Intel Xeon 6242 CPU. Training took roughly 8 GPU hours on an RTX2080 GPU and the same FT-PFN is used in all experiments described in Section 5, without any retraining/fine-tuning.
Software Dependencies	No	The paper mentions using a 'sequence Transformer', 'Adam optimizer', and 'cosine annealing', but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup	Yes	We use a standard training procedure for all experiments, minimizing the cross-entropy loss from Equation 1 on 2.0M synthethic datasets generated as described in Section A.2, using the Adam optimizer (Kingma et al., 2015) (learning rate 0.0001, batch size 25) with cosine annealing (Loshchilov & Hutter, 2017) with a linear warmup over the first 25% epochs of the training.