reproducibilityindex.ai

Learning to Infer Program Sketches

Authors: Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We provide the results of evaluating SKETCHADAPT in three test domains.
Researcher Affiliation	Collaboration	1MIT Brain and Cognitive Sciences 2MIT CSAIL 3MIT-IBM AI Lab 4Center for Brains, Minds and Machines (CBMM).
Pseudocode	Yes	Algorithm 1 SKETCHADAPT Training and Algorithm 2 SKETCHADAPT Evaluation
Open Source Code	No	The paper does not provide an explicit statement or a link to its own open-source code.
Open Datasets	Yes	We use the list processing DSL from Balog et al. (2016)... As our test corpus, we used string editing problems from the Sy Gu S (Alur et al., 2016) program synthesis competition, and string editing tasks used in Ellis et al. (2018)... Our ﬁnal evaluation domain is the Algo Lisp DSL and dataset, introduced in Polosukhin & Skidanov (2018).
Dataset Splits	Yes	We trained our model on programs of length 3, and tested its performance two datasets, one consisting of 100 programs of length 3, and the other with 100 length 4 programs. Therefore, we train our model on subsets of the data of various sizes to test generalization. Figure 4 and Table 4 depict our main results for this domain, testing all systems with a maximum timeout of 300 seconds per task. As in Bednarek et al. (2018), we ﬁlter the test and dev datasets for only those tasks for which reference programs satisfy the given specs. The ﬁltered version is also used for Figure 4.
Hardware Specification	No	The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	When using a beam size of 10 on the full dataset, SKETCHADAPT and the Generator only RNN baseline far exceed previously reported state of art performance and achieve near-perfect accuracy, whereas the Synthesizers only model is unable to achieve high performance. testing all systems with a maximum timeout of 300 seconds per task. Sketch Adapt, beam 100 (ours) and Generator only, beam 50 (Robust Fill).