Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization
Authors: Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare HYPERBAND with popular Bayesian Optimization methods on several hyperparameter optimization problems. We observe that HYPERBAND can provide more than an order of magnitude speedups over competitors on a variety of neural network and kernel-based learning problems. |
| Researcher Affiliation | Collaboration | UCLA, UC Berkeley, NYU, and Google {lishal,ameet}@cs.ucla.edu, kjamieson@berkeley.edu desalvo@cims.nyu.edu, rostami@google.com |
| Pseudocode | Yes | Algorithm 1: HYPERBAND algorithm for hyperparameter optimization. |
| Open Source Code | No | The paper provides a link to a competitor's code ('The package provided by Klein et al. (2016) is available at https://github.com/automl/RoBO.') but does not provide a link or explicit statement about the open-source code for HYPERBAND itself. |
| Open Datasets | Yes | Datasets: We considered three image classification datasets: CIFAR-10 (Krizhevsky, 2009), rotated MNIST with background images (MRBI) (Larochelle et al., 2007), and Street View House Numbers (SVHN) (Netzer et al., 2011). |
| Dataset Splits | Yes | The splits used for each dataset are as follows: (1) CIFAR-10 has 40k, 10k, and 10k instances; (2) MRBI has 10k, 2k, and 50k instances; and (3) SVHN has close to 600k, 6k, and 26k instances for training, validation, and test respectively. |
| Hardware Specification | Yes | Each hyperparameter optimization algorithm is run for ten trials on Amazon EC2 m4.2xlarge instances; for a given trial, HYPERBAND is allowed to run for two outer loops, bracket s = 4 is repeated 10 times, and all other searchers are run for 12 hours. |
| Software Dependencies | No | The paper mentions software frameworks like 'cuda-convnet' and 'Caffe framework' but does not specify version numbers for these or any other software dependencies, which are crucial for reproducibility. |
| Experiment Setup | Yes | Our search space includes learning rate, batch size, and number of kernels for the two layers of the network as hyperparameters (details are shown in Table 3 in Appendix A). |