Leveraging Language to Learn Program Abstractions and Search Heuristics

Authors: Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, Jacob Andreas

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate LAPS on three different domains: string editing, compositional graphics drawing, and scene reasoning, which we choose to represent a diverse range of tasks and accompanying language (Fig. 2). In all three domains, we find that compared to the base synthesizer, LAPS learns and solves heldout synthesis problems faster (Table 1, Sec. 1-2), and produces higher-quality libraries that improve generalization even when natural language hints are not available at test time (Table 1, Sec. 3).
Researcher Affiliation Academia 1MIT 2Cornell University 3Center for Brains, Minds and Machines (CBMM) MIT.
Pseudocode Yes Algorithm 1 Input: Initial library L0, annotated training tasks (T, D) Initialize θL uniform; training task solutions p {} for i f do Ji Fit θL and T (dt|ρ) to (p, dt) Qi(ρ|t, dt) Train on (p, T, dt) and samples J p programs from search amortized with Qi Li abstractions optimized over (p, J) end for Return Qf, Lf
Open Source Code Yes The supplement contains full details (and code) to replicate all experiments; and additional qualitative results.
Open Datasets Yes String editing: structured string transformation problems taken from (Andreas et al., 2017) (n=1000 train; n=500 test). ... Compositional graphics: inverse graphics problems (n=200 train; n=111 test) ... Structured scene reasoning: inductive scene reasoning tasks (n= 212 train; n=115 test) where each synthesis problem is specified by a structured input scene, and outputs can be a number (how many red rubber things are there?), a boolean value (are there more blue things than green?), or another scene (what if all of the red things turned blue?). This domain is modeled on CLEVR (Johnson et al., 2017a)
Dataset Splits No The paper provides train and test set sizes (e.g., 'n=1000 train; n=500 test' for String editing), but does not explicitly state validation split sizes or percentages.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies No The paper mentions software components like 'bidirectional GRU' and 'functional programming primitives' but does not specify any version numbers for these or other software dependencies.
Experiment Setup No The paper states that hyperparameters were determined via a search ('determined using a hyperparameter search with the baseline') but does not provide specific values for these or other training configurations in the main text.