reproducibilityindex.ai

Efficient Forward Architecture Search

Authors: Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric J. Horvitz, Debadeepta Dey

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report the search results on CIFAR-10 (Krizhevsky, 2009) and the transfer result on Image Net (Russakovsky et al., 2015). Petridish is particularly well-suited for warm-starting from existing models crucial for lifelong-learning scenarios. 5 Experiments.
Researcher Affiliation	Collaboration	Hanzhang Hu,1 John Langford,2 Rich Caruana,2 Saurajit Mukherjee,2 Eric Horvitz,2 Debadeepta Dey2 1Carnegie Mellon University, 2Microsoft Research hanzhang@cs.cmu.edu, {jcl,rcaruana,saurajim,horvitz,dedey}@microsoft.com
Pseudocode	Yes	Algorithm 1 Petridish.initialize_candidates. Algorithm 2 Petridish.ﬁnalize_candidates.
Open Source Code	No	The paper does not contain a statement explicitly releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We report the search results on CIFAR-10 (Krizhevsky, 2009) and the transfer result on Image Net (Russakovsky et al., 2015). We also search on Penn Tree Bank (Marcus et al., 1993), and show that it is not an interesting data-set for evaluating NAS algorithms.
Dataset Splits	Yes	During search, we use the last 5000 training images as a validation set.
Hardware Specification	Yes	Petridish cell search ﬁnds a model with 2.87 0.13% error rate with 2.5M parameters, in 5 GPU-days using GTX 1080.
Software Dependencies	No	The paper mentions software components and frameworks like 'Res Net', 'SGD', 'Pyramid Net', but does not specify version numbers for any programming languages, libraries, or software used in the experiments (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	The seed model is trained for 200 epochs, with a batch size of 32 and a learning rate that decays from 0.025 to 0 in cosine decay (Loshchilov & Hutter, 2017). We apply drop-path (Larsson et al., 2017) with probability 0.6 and the standard CIFAR-10 cut-out (De Vries & Taylor, 2017). Weak learner selection and ﬁnalization are trained for 80 epochs each, using the same parameters. The ﬁnal model training is from scratch for 600 epochs on all training images with the same parameters. We train these models for 250 epochs with batch size 128, weight decay 3 10 5, and initial SGD learning rate of 0.1 (decayed by a factor of 0.97 per epoch).