reproducibilityindex.ai

Leveraging Language to Learn Program Abstractions and Search Heuristics

Authors: Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, Jacob Andreas

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate LAPS on three different domains: string editing, compositional graphics drawing, and scene reasoning, which we choose to represent a diverse range of tasks and accompanying language (Fig. 2). In all three domains, we ﬁnd that compared to the base synthesizer, LAPS learns and solves heldout synthesis problems faster (Table 1, Sec. 1-2), and produces higher-quality libraries that improve generalization even when natural language hints are not available at test time (Table 1, Sec. 3).
Researcher Affiliation	Academia	1MIT 2Cornell University 3Center for Brains, Minds and Machines (CBMM) MIT.
Pseudocode	Yes	Algorithm 1 Input: Initial library L0, annotated training tasks (T, D) Initialize θL uniform; training task solutions p {} for i f do Ji Fit θL and T (dt\|ρ) to (p, dt) Qi(ρ\|t, dt) Train on (p, T, dt) and samples J p programs from search amortized with Qi Li abstractions optimized over (p, J) end for Return Qf, Lf
Open Source Code	Yes	The supplement contains full details (and code) to replicate all experiments; and additional qualitative results.
Open Datasets	Yes	String editing: structured string transformation problems taken from (Andreas et al., 2017) (n=1000 train; n=500 test). ... Compositional graphics: inverse graphics problems (n=200 train; n=111 test) ... Structured scene reasoning: inductive scene reasoning tasks (n= 212 train; n=115 test) where each synthesis problem is speciﬁed by a structured input scene, and outputs can be a number (how many red rubber things are there?), a boolean value (are there more blue things than green?), or another scene (what if all of the red things turned blue?). This domain is modeled on CLEVR (Johnson et al., 2017a)
Dataset Splits	No	The paper provides train and test set sizes (e.g., 'n=1000 train; n=500 test' for String editing), but does not explicitly state validation split sizes or percentages.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies	No	The paper mentions software components like 'bidirectional GRU' and 'functional programming primitives' but does not specify any version numbers for these or other software dependencies.
Experiment Setup	No	The paper states that hyperparameters were determined via a search ('determined using a hyperparameter search with the baseline') but does not provide specific values for these or other training configurations in the main text.