reproducibilityindex.ai

RobustFill: Neural Program Learning under Noisy I/O

Authors: Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, for the ﬁrst time, we directly compare both approaches on a large-scale, real-world learning task and we additionally contrast to rule-based program synthesis, which uses hand-crafted semantics to guide the program generation. Our neural models use a modiﬁed attention RNN to allow encoding of variable-sized sets of I/O pairs, which achieve 92% accuracy on a real-world test set, compared to the 34% accuracy of the previous best neural synthesis approach.
Researcher Affiliation	Collaboration	1Microsoft Research, Redmond, Washington, USA 2MIT, Cambridge, Massachusetts, USA. Correspondence to: Jacob Devlin <jdevlin@microsoft.com>.
Pseudocode	No	The paper describes architectures and methods in text and diagrams, but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	For evaluating the trained models, we use Flash Fill Test, a set of 205 real-world examples collected from Microsoft Excel spreadsheets, and provided to us by the authors of Gulwani et al. (2012) and Parisotto et al. (2017).
Dataset Splits	No	A small amount of hyperparameter tuning was done on a synthetic validation set that was generated like the training. The paper mentions a 'synthetic validation set' but does not provide specific quantitative split information (e.g., percentages or sample counts) for it.
Hardware Specification	Yes	Training took approximately 24 hours on 2 Pascal Titan X GPUs, using an in-house toolkit. ... the amortized end-to-end cost of decoding is roughly 0.3 seconds per test instance for Attention-C-DP w/ Beam=100 and four observed examples (89% accuracy), on a Pascal Titan X GPU.
Software Dependencies	No	The paper mentions using an 'in-house toolkit' and training with 'plain SGD with gradient clipping'. It also references 'Microsoft Excel 2016' for comparison. However, it does not specify version numbers for any programming languages, libraries, or other software components crucial for replication.
Experiment Setup	Yes	In all experiments, the size of the recurrent and fully connected layers is 512, and the size of the embeddings is 128. Models were trained with plain SGD with gradient clipping. All models were trained for 2 million minibatch updates, where each minibatch contained 128 training instances (i.e., 128 programs with four I/O examples each).