Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization

Authors: Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare HYPERBAND with popular Bayesian Optimization methods on several hyperparameter optimization problems. We observe that HYPERBAND can provide more than an order of magnitude speedups over competitors on a variety of neural network and kernel-based learning problems.
Researcher Affiliation Collaboration UCLA, UC Berkeley, NYU, and Google {lishal,ameet}@cs.ucla.edu, kjamieson@berkeley.edu desalvo@cims.nyu.edu, rostami@google.com
Pseudocode Yes Algorithm 1: HYPERBAND algorithm for hyperparameter optimization.
Open Source Code No The paper provides a link to a competitor's code ('The package provided by Klein et al. (2016) is available at https://github.com/automl/RoBO.') but does not provide a link or explicit statement about the open-source code for HYPERBAND itself.
Open Datasets Yes Datasets: We considered three image classification datasets: CIFAR-10 (Krizhevsky, 2009), rotated MNIST with background images (MRBI) (Larochelle et al., 2007), and Street View House Numbers (SVHN) (Netzer et al., 2011).
Dataset Splits Yes The splits used for each dataset are as follows: (1) CIFAR-10 has 40k, 10k, and 10k instances; (2) MRBI has 10k, 2k, and 50k instances; and (3) SVHN has close to 600k, 6k, and 26k instances for training, validation, and test respectively.
Hardware Specification Yes Each hyperparameter optimization algorithm is run for ten trials on Amazon EC2 m4.2xlarge instances; for a given trial, HYPERBAND is allowed to run for two outer loops, bracket s = 4 is repeated 10 times, and all other searchers are run for 12 hours.
Software Dependencies No The paper mentions software frameworks like 'cuda-convnet' and 'Caffe framework' but does not specify version numbers for these or any other software dependencies, which are crucial for reproducibility.
Experiment Setup Yes Our search space includes learning rate, batch size, and number of kernels for the two layers of the network as hyperparameters (details are shown in Table 3 in Appendix A).