Learning to Infer Program Sketches

Authors: Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments We provide the results of evaluating SKETCHADAPT in three test domains.
Researcher Affiliation Collaboration 1MIT Brain and Cognitive Sciences 2MIT CSAIL 3MIT-IBM AI Lab 4Center for Brains, Minds and Machines (CBMM).
Pseudocode Yes Algorithm 1 SKETCHADAPT Training and Algorithm 2 SKETCHADAPT Evaluation
Open Source Code No The paper does not provide an explicit statement or a link to its own open-source code.
Open Datasets Yes We use the list processing DSL from Balog et al. (2016)... As our test corpus, we used string editing problems from the Sy Gu S (Alur et al., 2016) program synthesis competition, and string editing tasks used in Ellis et al. (2018)... Our final evaluation domain is the Algo Lisp DSL and dataset, introduced in Polosukhin & Skidanov (2018).
Dataset Splits Yes We trained our model on programs of length 3, and tested its performance two datasets, one consisting of 100 programs of length 3, and the other with 100 length 4 programs. Therefore, we train our model on subsets of the data of various sizes to test generalization. Figure 4 and Table 4 depict our main results for this domain, testing all systems with a maximum timeout of 300 seconds per task. As in Bednarek et al. (2018), we filter the test and dev datasets for only those tasks for which reference programs satisfy the given specs. The filtered version is also used for Figure 4.
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes When using a beam size of 10 on the full dataset, SKETCHADAPT and the Generator only RNN baseline far exceed previously reported state of art performance and achieve near-perfect accuracy, whereas the Synthesizers only model is unable to achieve high performance. testing all systems with a maximum timeout of 300 seconds per task. Sketch Adapt, beam 100 (ours) and Generator only, beam 50 (Robust Fill).