Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Optimizer Search with Reinforcement Learning
Authors: Irwan Bello, Barret Zoph, Vijay Vasudevan, Quoc V. Le
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a Conv Net model. These optimizers can also be transferred to perform well on different neural network architectures, including Google s neural machine translation system. |
| Researcher Affiliation | Industry | 1Google Brain. Correspondence to: Irwan Bello <EMAIL>, Barret Zoph <EMAIL>, Vijay Vasudevan <EMAIL>, Quoc V. Le <EMAIL>. |
| Pseudocode | No | The paper describes the architecture and process but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | These child networks are trained on the CIFAR-10 dataset, one of the most benchmarked datasets in deep learning. |
| Dataset Splits | Yes | The child networks have a batch size of 100 and evaluate the update rule on a fixed held-out validation set of 5,000 examples. |
| Hardware Specification | No | The paper mentions 'CPUs' and 'GPUs' but does not specify exact models or types (e.g., 'Intel Core i7' or 'NVIDIA A100'). |
| Software Dependencies | No | The paper mentions 'Tensor Flow (Abadi et al., 2016)' but does not provide a specific version number for it or any other software dependencies. |
| Experiment Setup | Yes | Across all experiments, our controller RNN is trained with the ADAM optimizer with a learning rate of 10 5 and a minibatch size of 5. The controller is a single-layer LSTM with hidden state size 150 and weights are initialized uniformly at random between -0.08 and 0.08. We also use an entropy penalty to aid in exploration. This entropy penalty coefficient is set to 0.0015. ... We set ϵ to 10 8, β1 to 0.9 and β2 = β3 to 0.999. |