PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

Authors: Neeratyoy Mallik, Edward Bergman, Carl Hvarfner, Danny Stoll, Maciej Janowski, Marius Lindauer, Luigi Nardi, Frank Hutter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate Prior Band s efficiency across a range of DL benchmarks and show its gains under informative expert input and robustness against poor expert beliefs.
Researcher Affiliation Collaboration Neeratyoy Mallik University of Freiburg mallik@cs.uni-freiburg.de Edward Bergman University of Freiburg bergmane@cs.uni-freiburg.de Carl Hvarfner Lund University carl.hvarfner@cs.lth.se Danny Stoll University of Freiburg stolld@cs.uni-freiburg.de Maciej Janowski University of Freiburg janowski@cs.uni-freiburg.de Marius Lindauer Leibniz University Hannover m.lindauer@ai.uni-hannover.de Luigi Nardi Lund University Stanford University DBtune luigi.nardi@cs.lth.se Frank Hutter University of Freiburg fh@cs.uni-freiburg.de
Pseudocode Yes Algorithm 1 Sampling from Eπ
Open Source Code Yes Our code for reproducing the experiments is open-sourced at https://github.com/automl/mf-prior-exp.
Open Datasets Yes We curated a set of 12 benchmarks that cover a diverse set of search spaces, including mixedtype spaces and log-scaled hyperparameters, and a wide range of downstream tasks, e.g., language modeling, image classification, tabular data, a medical application, and translation. We select 4 of the PD1 benchmarks (4 HPs) [32] that train large models such as transformers with batch sizes commonly found on modern hardware, and fit surrogates on them. Further, we select 5 benchmarks from LCBench (7 HPs) [33, 34] and consider all 3 JAHSBench [35] surrogate benchmarks that offer a 14 dimensional mixed-type search space for tuning both the architecture and training hyperparameters.
Dataset Splits No The paper mentions 'validation loss' and 'validation error' as performance metrics and discusses multi-fidelity optimization related to epochs (fidelity z), but it does not provide specific details on how the datasets are split into training, validation, and test sets (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used for its experiments, such as GPU models, CPU models, or memory configurations. It mentions '4 workers' but no hardware specifics.
Software Dependencies No The paper mentions using 'BOHB' and implementing other algorithms, as well as concepts like 'Gaussian Processes' and 'Expected Improvement', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes For all the HB based algorithms we use η = 3 and the fidelity bounds (zmin, zmax) as per the benchmark. For Prior Band, we fix these values at p = 0.5 and σ = 0.25. For the single workers, we report the mean validation error with standard error bars for 50 seeds; for multiworker experiments, we use 10 seeds.