Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
Authors: Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Rudy Bunel University of Oxford EMAIL Matthew Hausknecht Microsoft Research EMAIL Jacob Devlin Google EMAIL Rishabh Singh Microsoft Research EMAIL Pushmeet Kohli Deepmind EMAIL |
| Pseudocode | No | The paper provides a DSL specification but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code and data will be made available. |
| Open Datasets | Yes | The Karel DSL was previously used by Devlin et al. (2017a) to study the relative perfomances of a range of methods depending on the available amount of data. |
| Dataset Splits | Yes | 5000 programs are not used for training, and get split out between a validation set and a test set. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | Our models are implemented using the Pytorch framework (pyt). |
| Experiment Setup | Yes | The decoders are two-layer LSTM with a hidden size of 256. Tokens of the DSL are embedded to a 256 dimensional vector... All training is performed using the Adam optimizer, with a learning rate of 10^-4. Supervised training used a batch size of 128 and RL methods used a batch size of 16. We used 100 rollouts per samples for the Reinforce method and a beam size of 64 for methods based on the beam search. The value of C used for the methods computing a loss on bags of programs was 5. |