Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
Authors: Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Rudy Bunel University of Oxford rudy@robots.ox.ac.uk Matthew Hausknecht Microsoft Research matthew.hausknecht@microsoft.com Jacob Devlin Google jacobdevlin@google.com Rishabh Singh Microsoft Research risin@microsoft.com Pushmeet Kohli Deepmind pushmeet@google.com |
| Pseudocode | No | The paper provides a DSL specification but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code and data will be made available. |
| Open Datasets | Yes | The Karel DSL was previously used by Devlin et al. (2017a) to study the relative perfomances of a range of methods depending on the available amount of data. |
| Dataset Splits | Yes | 5000 programs are not used for training, and get split out between a validation set and a test set. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | Our models are implemented using the Pytorch framework (pyt). |
| Experiment Setup | Yes | The decoders are two-layer LSTM with a hidden size of 256. Tokens of the DSL are embedded to a 256 dimensional vector... All training is performed using the Adam optimizer, with a learning rate of 10^-4. Supervised training used a batch size of 128 and RL methods used a batch size of 16. We used 100 rollouts per samples for the Reinforce method and a beam size of 64 for methods based on the beam search. The value of C used for the methods computing a loss on bags of programs was 5. |