Hyperparameter optimization: a spectral approach
Authors: Elad Hazan, Adam Klivans, Yang Yuan
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by handtuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search. |
| Researcher Affiliation | Collaboration | Elad Hazan Princeton University and Google Brain ehazan@cs.princeton.edu Adam Klivans Department of Computer Science University of Texas at Austin klivans@cs.utexas.edu Yang Yuan Department of Computer Science Cornell University yangyuan@cs.cornell.edu |
| Pseudocode | Yes | Algorithm 1 Harmonica-1; Procedure 2 Polynomial Sparse Recovery (PSR); Algorithm 3 Harmonica-q |
| Open Source Code | Yes | A python implementation of Harmonica can be found at https://github.com/callowbird/Harmonica |
| Open Datasets | Yes | Our first experiment is over training residual network on Cifar-10 dataset9. ... 9https://github.com/facebook/fb.resnet.torch |
| Dataset Splits | No | The paper mentions training and test phases, but does not explicitly describe a separate validation split or how it was used in terms of percentages, counts, or methodology. It refers to "training epochs" and "test error" but not a validation set. |
| Hardware Specification | No | The paper mentions "GPU Day" as a unit of measurement for running time and states "6.1 GPU days" and "20 GPUs running in parallel." However, it does not specify the model or type of GPU, CPU, or any other hardware component used for the experiments. |
| Software Dependencies | No | The paper mentions several software tools and libraries used or compared against, such as "Spearmint6 (Snoek et al., 2012)", "Hyperband", "SH7", "Random Search", "Lasso (Tibshirani, 1996)", and provides GitHub links for "Harmonica" and "fb.resnet.torch". However, it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Our first experiment is over training residual network on Cifar-10 dataset9. We included 39 binary hyperparameters, including initialization, optimization method, learning rate schedule, momentum rate, etc. Table 1 (Section C.1) details the hyperparameters considered. ... More specifically, during the feature selection stages, we run Harmonica for tuning an 8 layer neural network with 30 training epochs. ... as our base algorithm on the big 56 layer neural network for training the whole 160 epochs. |