Neural Architecture Search with Reinforcement Learning
Authors: Barret Zoph, Quoc Le
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. |
| Researcher Affiliation | Industry | Barret Zoph , Quoc V. Le Google Brain {barretzoph,qvl}@google.com |
| Pseudocode | No | The paper describes algorithms in text and uses figures to illustrate concepts (e.g., Figure 2, Figure 4, Figure 5), but does not provide explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The code for running the models found by the controller on CIFAR-10 and PTB will be released at https://github.com/tensorflow/models. Additionally, we have added the RNN cell found using our method under the name NASCell into TensorFlow, so others can easily use it. |
| Open Datasets | Yes | We apply our method to an image classification task with CIFAR-10 and a language modeling task with Penn Treebank, two of the most benchmarked datasets in deep learning. |
| Dataset Splits | Yes | The validation set has 5,000 examples randomly sampled from the training set, the remaining 45,000 examples are used for training. |
| Hardware Specification | No | The paper mentions training on '800 GPUs concurrently' for CIFAR-10 and '400 CPUs concurrently' for Penn Treebank, but does not specify the models or other detailed hardware specifications (e.g., NVIDIA GPU model, Intel CPU type). |
| Software Dependencies | No | The paper mentions software like 'ADAM optimizer' and 'Momentum Optimizer', and integrating a cell 'into TensorFlow', but does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | The controller RNN is a two-layer LSTM with 35 hidden units on each layer. It is trained with the ADAM optimizer (Kingma & Ba, 2015) with a learning rate of 0.0006. The weights of the controller are initialized uniformly between -0.08 and 0.08. For the distributed training, we set the number of parameter server shards S to 20, the number of controller replicas K to 100 and the number of child replicas m to 8... a child model is constructed and trained for 50 epochs... We use the Momentum Optimizer with a learning rate of 0.1, weight decay of 1e-4, momentum of 0.9 and used Nesterov Momentum (Sutskever et al., 2013). |