Efficient Forward Architecture Search
Authors: Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric J. Horvitz, Debadeepta Dey
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report the search results on CIFAR-10 (Krizhevsky, 2009) and the transfer result on Image Net (Russakovsky et al., 2015). Petridish is particularly well-suited for warm-starting from existing models crucial for lifelong-learning scenarios. 5 Experiments. |
| Researcher Affiliation | Collaboration | Hanzhang Hu,1 John Langford,2 Rich Caruana,2 Saurajit Mukherjee,2 Eric Horvitz,2 Debadeepta Dey2 1Carnegie Mellon University, 2Microsoft Research hanzhang@cs.cmu.edu, {jcl,rcaruana,saurajim,horvitz,dedey}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Petridish.initialize_candidates. Algorithm 2 Petridish.finalize_candidates. |
| Open Source Code | No | The paper does not contain a statement explicitly releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We report the search results on CIFAR-10 (Krizhevsky, 2009) and the transfer result on Image Net (Russakovsky et al., 2015). We also search on Penn Tree Bank (Marcus et al., 1993), and show that it is not an interesting data-set for evaluating NAS algorithms. |
| Dataset Splits | Yes | During search, we use the last 5000 training images as a validation set. |
| Hardware Specification | Yes | Petridish cell search finds a model with 2.87 0.13% error rate with 2.5M parameters, in 5 GPU-days using GTX 1080. |
| Software Dependencies | No | The paper mentions software components and frameworks like 'Res Net', 'SGD', 'Pyramid Net', but does not specify version numbers for any programming languages, libraries, or software used in the experiments (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | The seed model is trained for 200 epochs, with a batch size of 32 and a learning rate that decays from 0.025 to 0 in cosine decay (Loshchilov & Hutter, 2017). We apply drop-path (Larsson et al., 2017) with probability 0.6 and the standard CIFAR-10 cut-out (De Vries & Taylor, 2017). Weak learner selection and finalization are trained for 80 epochs each, using the same parameters. The final model training is from scratch for 600 epochs on all training images with the same parameters. We train these models for 250 epochs with batch size 128, weight decay 3 10 5, and initial SGD learning rate of 0.1 (decayed by a factor of 0.97 per epoch). |