ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

Authors: Kensen Shi, Joey Hong, Yinlin Deng, Pengcheng Yin, Manzil Zaheer, Charles Sutton

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate this approach, we introduce a new meta-benchmark for measuring the compositional generalization abilities of program synthesizers. ... We experiment with Transformers trained from scratch and with LLMs using few-shot prompting. ... Figure 2: Compositional generalization results with beam size 10. Error bars denote 95% confidence intervals of the mean across 5 trials.
Researcher Affiliation Collaboration Kensen Shi Google Deep Mind kshi@google.com Joey Hong UC Berkeley joey_hong@berkeley.edu Yinlin Deng University of Illinois Urbana-Champaign yinlind2@illinois.edu Pengcheng Yin Google Deep Mind pcyin@google.com Manzil Zaheer Google Deep Mind manzilzaheer@google.com Charles Sutton Google Deep Mind charlessutton@google.com
Pseudocode Yes Algorithm 1 Exe Dec: synthesis via decomposition in the execution space.
Open Source Code Yes Our code, datasets, and checkpoints for the Transformer models trained from scratch are available at https://github.com/google-deepmind/exedec.
Open Datasets Yes Robust Fill (Devlin et al., 2017) and Deep Coder (Balog et al., 2017).
Dataset Splits Yes Our meta-benchmark describes train-test splits for 5 different types of compositional generalization... For Length-Generalization, we train on problems of lengths 1 to n and test on lengths n + 1 to m (where m > n). ... For Length-Generalization, we train on programs of length 1 to 6 inclusive and test on programs of length 7 to 10.
Hardware Specification Yes Training took about 1 day for Robust Fill (or about 5 hours for Deep Coder) with 8 TPU v2 accelerators per model.
Software Dependencies No The paper mentions using the 'Adam optimizer' and details model architecture, but it does not provide specific software library names with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes We used an embedding dimension of 512, hidden dimension of 1024, 3 layers, and 4 attention heads. ... We train with the Adam optimizer with a learning rate of 2e-4 with linear warmup for 16,000 steps and square root decay, with a batch size of 128 and 500K training steps