BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Authors: Stefan Falkner, Aaron Klein, Frank Hutter

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive empirical evaluation (Section 5) demonstrates that our method combines the best aspects of Bayesian optimization and Hyperband: it often finds good solutions over an order of magnitude faster than Bayesian optimization and converges to the best solutions orders of magnitudes faster than Hyperband. Figure 1 illustrates this pattern in a nutshell for optimizing six hyperparameters of a neural network.
Researcher Affiliation Academia 1Department of Computer Science, University of Freiburg, Freiburg, Germany.
Pseudocode Yes Algorithm 1: Pseudocode for Hyperband using Successive Halving (SH) as a subroutine. ... Algorithm 2: Pseudocode for sampling in BOHB
Open Source Code Yes Code for BOHB and our benchmarks is publicly available at https://github.com/automl/Hp Band Ster
Open Datasets Yes To compare against GP-BO, we used the support vector machine on MNIST surrogate from Klein et al. (2017a). ... We optimized six hyperparameters that control the training procedure ... for six different datasets gathered from Open ML (Vanschoren et al., 2014): Adult (Kohavi, 1996), Higgs (Baldi et al., 2014), Letter (Frey & Slate, 1991), MNIST (Le Cun et al., 2001), Optdigits (Dheeru & Karra Taniskidou, 2017), and Poker (Cattral et al., 2002). ... We considered two UCI (Dheeru & Karra Taniskidou, 2017) regression datasets, Boston housing and protein structure as described by Hernández-Lobato & Adams (2015).
Dataset Splits Yes To perform hyperparameter optimization, we split off 5 000 training images as a validation set.
Hardware Specification Yes Each worker used 2 NVIDIA TI 1080 GPUs for parallel training
Software Dependencies No To compare against TPE, we used the Hyperopt package (Bergstra et al., 2011), and for all GP-BO methods we used the Ro BO python package (Klein et al., 2017b). ... We used the KDE implementation from statsmodels (Seabold & Perktold, 2010)... We used the Bayesian neural network implementation provided in the Ro BO python package (Klein et al., 2017b) as described by Springenberg et al. (2016). ... For PPO, we used the implementation from the Tensor Force framework developed by Schaarschmidt et al. (2017) and we used the implementation from Open AI Gym (Brockman et al., 2016). The paper lists software packages and frameworks used but does not provide specific version numbers for them.
Experiment Setup Yes In all experiments we set η = 3 for HB and BOHB as recommended by Li et al. (2017). ... We ran BOHB with budgets of 22, 66, 200, and 600 epochs, using 19 parallel workers. ... As hyperparameters, we optimized learning rate, momentum, weight decay, and batch size.