Learning to Infer Program Sketches
Authors: Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We provide the results of evaluating SKETCHADAPT in three test domains. |
| Researcher Affiliation | Collaboration | 1MIT Brain and Cognitive Sciences 2MIT CSAIL 3MIT-IBM AI Lab 4Center for Brains, Minds and Machines (CBMM). |
| Pseudocode | Yes | Algorithm 1 SKETCHADAPT Training and Algorithm 2 SKETCHADAPT Evaluation |
| Open Source Code | No | The paper does not provide an explicit statement or a link to its own open-source code. |
| Open Datasets | Yes | We use the list processing DSL from Balog et al. (2016)... As our test corpus, we used string editing problems from the Sy Gu S (Alur et al., 2016) program synthesis competition, and string editing tasks used in Ellis et al. (2018)... Our final evaluation domain is the Algo Lisp DSL and dataset, introduced in Polosukhin & Skidanov (2018). |
| Dataset Splits | Yes | We trained our model on programs of length 3, and tested its performance two datasets, one consisting of 100 programs of length 3, and the other with 100 length 4 programs. Therefore, we train our model on subsets of the data of various sizes to test generalization. Figure 4 and Table 4 depict our main results for this domain, testing all systems with a maximum timeout of 300 seconds per task. As in Bednarek et al. (2018), we filter the test and dev datasets for only those tasks for which reference programs satisfy the given specs. The filtered version is also used for Figure 4. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | When using a beam size of 10 on the full dataset, SKETCHADAPT and the Generator only RNN baseline far exceed previously reported state of art performance and achieve near-perfect accuracy, whereas the Synthesizers only model is unable to achieve high performance. testing all systems with a maximum timeout of 300 seconds per task. Sketch Adapt, beam 100 (ours) and Generator only, beam 50 (Robust Fill). |