Synthetic Datasets for Neural Program Synthesis

Authors: Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, Dawn Song

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
Researcher Affiliation Collaboration Richard Shin UC Berkeley Neel Kant UC Berkeley and ML@B Kavi Gupta UC Berkeley Christopher Bender UC Berkeley and ML@B Brandon Trabucco UC Berkeley and ML@B Rishabh Singh Google Brain Dawn Song UC Berkeley
Pseudocode Yes For full pseudocode see Section B.1 in the appendix.
Open Source Code No The paper mentions 'tensor2tensor (Vaswani et al., 2018), an open-source deep learning library' as a tool they used, but does not state that their own implementation code is open source or provide a link to it.
Open Datasets No The paper mentions using a 'provided synthetic training set' from Bunel et al. (2018) and describes generating new synthetic datasets, but does not provide specific access information (link, DOI, repository, or formal citation with authors/year for *their generated* datasets) to make them publicly available.
Dataset Splits No The paper mentions 'existing validation and test sets' and describes generating new training and test sets, but does not provide specific percentage splits or sample counts for the original datasets, nor explicit numerical splits for their newly generated ones to ensure reproducibility of the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'tensor2tensor (Vaswani et al., 2018)' but does not provide specific version numbers for this or any other software dependencies, which would be needed for reproducibility.
Experiment Setup No The paper states 'We reproduced the encoder-decoder model of Bunel et al. (2018) and trained it using the provided synthetic training set with the teacher-forcing maximum likelihood objective,' but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings.