Stepping Stones to Inductive Synthesis of Low-Level Looping Programs

Authors: Christopher D. Rosin2362-2370

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present MAKESPEARE, a simple delayed-acceptance hillclimbing method that synthesizes low-level looping programs from input/output examples. ... The method performs well on a set of established benchmarks, and succeeds on the previously unsolved Collatz Numbers program synthesis problem. Additional benchmarks include the problem of rapidly sorting integer arrays, in which we observe the emergence of comb sort (a Shell sort variant that is empirically fast). ... Table 4 shows results (see Sec. 5.7 for TIS-100).
Researcher Affiliation Industry Christopher D. Rosin Parity Computing, Inc. San Diego, California, USA christopher.rosin@gmail.com
Pseudocode Yes Figure 1: Delayed Acceptance Hillclimbing. When applied to program synthesis, N is the period s best program, Y is search s current program, and Z is the new variant to eval. ... Figure 2: Local Search Operators ... Figure 3: Eval Scoring Function (see Sec 3.3 for TIS-100 detail)
Open Source Code Yes Implementation Code and data are available for download.3 https://github.com/Christopher Rosin/MAKESPEARE
Open Datasets Yes We use the explicit instances made available for download (Helmuth and Spector 2015a) rather than the protocol for generating new instances (Helmuth and Spector 2015c). ... Helmuth, T., and Spector, L. 2015a. General program synthesis training and test examples CSVs. github.com/thelmuth/Program-Synthesis-Benchmark-Data.
Dataset Splits No The paper describes the use of training and test sets but does not explicitly mention a separate 'validation' set or specific 'validation split' details for reproducibility.
Hardware Specification Yes The full set of Delayed Acceptance experiments below finish in 10 days, using 6 machines, each with a 4-core Intel i7-6700 CPU.
Software Dependencies No The paper mentions 'Dyn ASM (Pall 2017)' but does not provide a specific version number for this or any other software dependency.
Experiment Setup Yes With swap P {0, 0.1}, double P {0, 0.1, 0.5, 0.9}, copy P {0, 0.1, 0.5}, and I {3k, 10k, 25k, 75k, 150k}, a grid search was run on the preliminary benchmarks in Table 3. For each combination, Delayed Acceptance runs 100 times with max 300k evaluated programs per run. We find the parameters in Fig. 2 with I=75k have the most total runs that generalize perfectly, solving all 5 preliminary benchmarks.