Designing Neural Network Architectures using Reinforcement Learning
Authors: Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments with a space of model architectures consisting of only standard convolution, pooling, and fully connected layers using three standard image classification datasets: CIFAR-10, SVHN, and MNIST. The learning agent discovers CNN architectures that beat all existing networks designed only with the same layer types (e.g., Springenberg et al. (2014); Srivastava et al. (2015)). In addition, their performance is competitive against network designs that include complex layer types and training procedures (e.g., Clevert et al. (2015); Lee et al. (2016)). Finally, the Meta QNN selected models comfortably outperform previous automated network design methods (Stanley & Miikkulainen, 2002; Bergstra et al., 2013). |
| Researcher Affiliation | Academia | Bowen Baker, Otkrist Gupta, Nikhil Naik & Ramesh Raskar Media Laboratory Massachusetts Institute of Technology Cambridge MA 02139, USA {bowen, otkrist, naik, raskar}@mit.edu |
| Pseudocode | Yes | A ALGORITHM; Algorithm 1 Q-learning For CNN Topologies; Algorithm 2 SAMPLE NEW NETWORK(ϵ, Q); Algorithm 3 UPDATE Q VALUES(Q, S, U, accuracy) |
| Open Source Code | Yes | For more information, model files, and code, please visit https://bowenbaker.github.io/metaqnn/ |
| Open Datasets | Yes | We conduct experiments with a space of model architectures consisting of only standard convolution, pooling, and fully connected layers using three standard image classification datasets: CIFAR-10, SVHN, and MNIST. |
| Dataset Splits | Yes | For each experiment, we created a validation set by randomly taking 5,000 samples from the training set such that the resulting class distributions were unchanged. |
| Hardware Specification | No | Our experiments using Caffe (Jia et al., 2014) took 8-10 days to complete for each dataset with a hardware setup consisting of 10 NVIDIA GPUs. |
| Software Dependencies | No | Our experiments using Caffe (Jia et al., 2014) took 8-10 days to complete for each dataset with a hardware setup consisting of 10 NVIDIA GPUs. (No version for Caffe or other libraries is specified.) |
| Experiment Setup | Yes | For every network, a dropout layer was added after every two layers. The ith dropout layer, out of a total n dropout layers, had a dropout probability of i 2n. Each model was trained for a total of 20 epochs with the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9, β2 = 0.999, ε = 10 8. The batch size was set to 128, and the initial learning rate was set to 0.001. |