Neural Architecture Search with Reinforcement Learning

Authors: Barret Zoph, Quoc Le

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model.
Researcher Affiliation Industry Barret Zoph , Quoc V. Le Google Brain {barretzoph,qvl}@google.com
Pseudocode No The paper describes algorithms in text and uses figures to illustrate concepts (e.g., Figure 2, Figure 4, Figure 5), but does not provide explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The code for running the models found by the controller on CIFAR-10 and PTB will be released at https://github.com/tensorflow/models. Additionally, we have added the RNN cell found using our method under the name NASCell into TensorFlow, so others can easily use it.
Open Datasets Yes We apply our method to an image classification task with CIFAR-10 and a language modeling task with Penn Treebank, two of the most benchmarked datasets in deep learning.
Dataset Splits Yes The validation set has 5,000 examples randomly sampled from the training set, the remaining 45,000 examples are used for training.
Hardware Specification No The paper mentions training on '800 GPUs concurrently' for CIFAR-10 and '400 CPUs concurrently' for Penn Treebank, but does not specify the models or other detailed hardware specifications (e.g., NVIDIA GPU model, Intel CPU type).
Software Dependencies No The paper mentions software like 'ADAM optimizer' and 'Momentum Optimizer', and integrating a cell 'into TensorFlow', but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup Yes The controller RNN is a two-layer LSTM with 35 hidden units on each layer. It is trained with the ADAM optimizer (Kingma & Ba, 2015) with a learning rate of 0.0006. The weights of the controller are initialized uniformly between -0.08 and 0.08. For the distributed training, we set the number of parameter server shards S to 20, the number of controller replicas K to 100 and the number of child replicas m to 8... a child model is constructed and trained for 50 epochs... We use the Momentum Optimizer with a learning rate of 0.1, weight decay of 1e-4, momentum of 0.9 and used Nesterov Momentum (Sutskever et al., 2013).