reproducibilityindex.ai

CrossBeam: Learning to Search in Bottom-Up Program Synthesis

Authors: Kensen Shi, Hanjun Dai, Kevin Ellis, Charles Sutton

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CROSSBEAM in two very different domains, string manipulation and logic programming. We observe that CROSSBEAM learns to search efﬁciently, exploring much smaller portions of the program space compared to the state-of-the-art. 4 EXPERIMENTS Our experiments compare CROSSBEAM to state-of-the-art methods in two very different domains, string manipulation and inductive logic programming.
Researcher Affiliation	Collaboration	Kensen Shi Google Research kshi@google.com Hanjun Dai Google Research hadai@google.com Kevin Ellis Cornell University kellis@cornell.edu Charles Sutton Google Research charlessutton@google.com
Pseudocode	Yes	Algorithm 1 The CROSSBEAM algorithm, with training data generation in blue
Open Source Code	Yes	1https://github.com/google-research/crossbeam
Open Datasets	Yes	For the string manipulation domain, we use the same DSL, success criteria (satisfying the I/O examples), training task generation procedure, and test tasks as used in BUSTLE (Odena et al., 2021). We run CROSSBEAM on both sets of test tasks used to evaluate BUSTLE, with a search budget of 50,000 candidate programs. Our synthetic test set contains tasks of varying size, i.e., the minimum number of nodes in the expression tree of a solution. For each size in {5, . . . , 19}, we randomly select 50 tasks among all possible tasks of that size, excluding those appearing in the training dataset.2 Our handcrafted test set, inspired by Peano arithmetic, comprises 30 common mathematical relationships...
Dataset Splits	Yes	After every 10K training steps, we evaluate on synthetic validation tasks to choose the best model. For the string manipulation domain, we use the same DSL, success criteria (satisfying the I/O examples), training task generation procedure, and test tasks as used in BUSTLE (Odena et al., 2021).
Hardware Specification	Yes	We train CROSSBEAM with a distributed synchronized Adam optimizer on 8 V100 GPUs, with gradient accumulation of 4 for an effective batch size of 32.
Software Dependencies	No	The paper mentions 'distributed synchronized Adam optimizer', 'LSTM', 'MLP', and 'GREAT Transformer' but does not specify versions of programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	We train CROSSBEAM with a distributed synchronized Adam optimizer on 8 V100 GPUs, with gradient accumulation of 4 for an effective batch size of 32. We generate 10M training tasks for string manipulation and 1M for logic programming. The models train for at most 500K steps or 5 days, whichever limit is reached ﬁrst. We use K = 10 and learning rate 1 × 10−4 during training.