BOHB: Robust and Efficient Hyperparameter Optimization at Scale
Authors: Stefan Falkner, Aaron Klein, Frank Hutter
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive empirical evaluation (Section 5) demonstrates that our method combines the best aspects of Bayesian optimization and Hyperband: it often finds good solutions over an order of magnitude faster than Bayesian optimization and converges to the best solutions orders of magnitudes faster than Hyperband. Figure 1 illustrates this pattern in a nutshell for optimizing six hyperparameters of a neural network. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Freiburg, Freiburg, Germany. |
| Pseudocode | Yes | Algorithm 1: Pseudocode for Hyperband using Successive Halving (SH) as a subroutine. ... Algorithm 2: Pseudocode for sampling in BOHB |
| Open Source Code | Yes | Code for BOHB and our benchmarks is publicly available at https://github.com/automl/Hp Band Ster |
| Open Datasets | Yes | To compare against GP-BO, we used the support vector machine on MNIST surrogate from Klein et al. (2017a). ... We optimized six hyperparameters that control the training procedure ... for six different datasets gathered from Open ML (Vanschoren et al., 2014): Adult (Kohavi, 1996), Higgs (Baldi et al., 2014), Letter (Frey & Slate, 1991), MNIST (Le Cun et al., 2001), Optdigits (Dheeru & Karra Taniskidou, 2017), and Poker (Cattral et al., 2002). ... We considered two UCI (Dheeru & Karra Taniskidou, 2017) regression datasets, Boston housing and protein structure as described by Hernández-Lobato & Adams (2015). |
| Dataset Splits | Yes | To perform hyperparameter optimization, we split off 5 000 training images as a validation set. |
| Hardware Specification | Yes | Each worker used 2 NVIDIA TI 1080 GPUs for parallel training |
| Software Dependencies | No | To compare against TPE, we used the Hyperopt package (Bergstra et al., 2011), and for all GP-BO methods we used the Ro BO python package (Klein et al., 2017b). ... We used the KDE implementation from statsmodels (Seabold & Perktold, 2010)... We used the Bayesian neural network implementation provided in the Ro BO python package (Klein et al., 2017b) as described by Springenberg et al. (2016). ... For PPO, we used the implementation from the Tensor Force framework developed by Schaarschmidt et al. (2017) and we used the implementation from Open AI Gym (Brockman et al., 2016). The paper lists software packages and frameworks used but does not provide specific version numbers for them. |
| Experiment Setup | Yes | In all experiments we set η = 3 for HB and BOHB as recommended by Li et al. (2017). ... We ran BOHB with budgets of 22, 66, 200, and 600 epochs, using 19 parallel workers. ... As hyperparameters, we optimized learning rate, momentum, weight decay, and batch size. |