Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis

Authors: Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 EXPERIMENTS
Researcher Affiliation Collaboration Rudy Bunel University of Oxford rudy@robots.ox.ac.uk Matthew Hausknecht Microsoft Research matthew.hausknecht@microsoft.com Jacob Devlin Google jacobdevlin@google.com Rishabh Singh Microsoft Research risin@microsoft.com Pushmeet Kohli Deepmind pushmeet@google.com
Pseudocode No The paper provides a DSL specification but no structured pseudocode or algorithm blocks.
Open Source Code No Code and data will be made available.
Open Datasets Yes The Karel DSL was previously used by Devlin et al. (2017a) to study the relative perfomances of a range of methods depending on the available amount of data.
Dataset Splits Yes 5000 programs are not used for training, and get split out between a validation set and a test set.
Hardware Specification No The paper does not provide specific details on the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No Our models are implemented using the Pytorch framework (pyt).
Experiment Setup Yes The decoders are two-layer LSTM with a hidden size of 256. Tokens of the DSL are embedded to a 256 dimensional vector... All training is performed using the Adam optimizer, with a learning rate of 10^-4. Supervised training used a batch size of 128 and RL methods used a batch size of 16. We used 100 rollouts per samples for the Reinforce method and a beam size of 64 for methods based on the beam search. The value of C used for the methods computing a loss on bags of programs was 5.