reproducibilityindex.ai

Stepping Stones to Inductive Synthesis of Low-Level Looping Programs

Authors: Christopher D. Rosin2362-2370

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present MAKESPEARE, a simple delayed-acceptance hillclimbing method that synthesizes low-level looping programs from input/output examples. ... The method performs well on a set of established benchmarks, and succeeds on the previously unsolved Collatz Numbers program synthesis problem. Additional benchmarks include the problem of rapidly sorting integer arrays, in which we observe the emergence of comb sort (a Shell sort variant that is empirically fast). ... Table 4 shows results (see Sec. 5.7 for TIS-100).
Researcher Affiliation	Industry	Christopher D. Rosin Parity Computing, Inc. San Diego, California, USA christopher.rosin@gmail.com
Pseudocode	Yes	Figure 1: Delayed Acceptance Hillclimbing. When applied to program synthesis, N is the period s best program, Y is search s current program, and Z is the new variant to eval. ... Figure 2: Local Search Operators ... Figure 3: Eval Scoring Function (see Sec 3.3 for TIS-100 detail)
Open Source Code	Yes	Implementation Code and data are available for download.3 https://github.com/Christopher Rosin/MAKESPEARE
Open Datasets	Yes	We use the explicit instances made available for download (Helmuth and Spector 2015a) rather than the protocol for generating new instances (Helmuth and Spector 2015c). ... Helmuth, T., and Spector, L. 2015a. General program synthesis training and test examples CSVs. github.com/thelmuth/Program-Synthesis-Benchmark-Data.
Dataset Splits	No	The paper describes the use of training and test sets but does not explicitly mention a separate 'validation' set or specific 'validation split' details for reproducibility.
Hardware Specification	Yes	The full set of Delayed Acceptance experiments below ﬁnish in 10 days, using 6 machines, each with a 4-core Intel i7-6700 CPU.
Software Dependencies	No	The paper mentions 'Dyn ASM (Pall 2017)' but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	With swap P {0, 0.1}, double P {0, 0.1, 0.5, 0.9}, copy P {0, 0.1, 0.5}, and I {3k, 10k, 25k, 75k, 150k}, a grid search was run on the preliminary benchmarks in Table 3. For each combination, Delayed Acceptance runs 100 times with max 300k evaluated programs per run. We ﬁnd the parameters in Fig. 2 with I=75k have the most total runs that generalize perfectly, solving all 5 preliminary benchmarks.