Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Program Synthesis with Best-First Bottom-Up Search

Authors: Saqib Ameen, Levi H.S. Lelis

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on string manipulation and bit-vector tasks show that Bee Search can outperform existing cost-guided BUS approaches when employing more complex domain-specific languages (DSLs); Bee Search and previous approaches perform equally well with simpler DSLs. Furthermore, our new cost function with Bee Search outperforms previous cost functions on string manipulation tasks.
Researcher Affiliation Academia Saqib Ameen EMAIL Levi H. S. Lelis EMAIL Department of Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta, Canada
Pseudocode Yes Algorithm 1 Uninformed Bottom-Up Search (BUS) ... Algorithm 2 Generic Cost-Guided Bottom-Up Search ... Algorithm 3 Heap Search Algortihm ... Algorithm 4 Brute Search ... Algorithm 5 Bee Search ... Algorithm 6 Next-Program procedure
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We evaluate Bee Search on three benchmark problems set: (i) 205 string manipulation problems 108 programming by example string problems from 2017 Sy Gu S competition, 37 real problems faced by people and posted on Stack Overflow, and 60 spreadsheet problems from Exceljet (Lee et al., 2018) (we call this benchmark the Sy Gu S benchmark), (ii) 38 handcrafted string manipulation problems from Bustle s paper (Odena et al., 2021), and (iii) 27 bit-vector problems from the Hacker s Delight book (Warren, 2013).
Dataset Splits No The paper describes using problem sets from various benchmarks (Sy Gu S, Bustle's paper, Hacker's Delight) but does not specify how these problem sets are split into training, validation, or test sets for the program synthesis task itself. It refers to these as problems to be solved, not datasets to be split for model training/evaluation.
Hardware Specification Yes All experiments were run on 2.4 GHz CPUs with 16 GB of RAM.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, specific library versions).
Experiment Setup Yes The algorithms had 120 minutes to solve each task. ... Probe uses an online learning scheme in which the probabilities of the PCFG are updated as more input-output examples are solved. Probe uses a parameter to determine when to update the probabilities; we tested the values of d = {1, 2, , 7} in all algorithms using w Probe, and report the results for the value that performed best for each algorithm. ... We report the average and standard deviation over 5 independent runs of all results involving the neural network of Bustle.