ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis
Authors: Kensen Shi, Joey Hong, Yinlin Deng, Pengcheng Yin, Manzil Zaheer, Charles Sutton
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate this approach, we introduce a new meta-benchmark for measuring the compositional generalization abilities of program synthesizers. ... We experiment with Transformers trained from scratch and with LLMs using few-shot prompting. ... Figure 2: Compositional generalization results with beam size 10. Error bars denote 95% confidence intervals of the mean across 5 trials. |
| Researcher Affiliation | Collaboration | Kensen Shi Google Deep Mind kshi@google.com Joey Hong UC Berkeley joey_hong@berkeley.edu Yinlin Deng University of Illinois Urbana-Champaign yinlind2@illinois.edu Pengcheng Yin Google Deep Mind pcyin@google.com Manzil Zaheer Google Deep Mind manzilzaheer@google.com Charles Sutton Google Deep Mind charlessutton@google.com |
| Pseudocode | Yes | Algorithm 1 Exe Dec: synthesis via decomposition in the execution space. |
| Open Source Code | Yes | Our code, datasets, and checkpoints for the Transformer models trained from scratch are available at https://github.com/google-deepmind/exedec. |
| Open Datasets | Yes | Robust Fill (Devlin et al., 2017) and Deep Coder (Balog et al., 2017). |
| Dataset Splits | Yes | Our meta-benchmark describes train-test splits for 5 different types of compositional generalization... For Length-Generalization, we train on problems of lengths 1 to n and test on lengths n + 1 to m (where m > n). ... For Length-Generalization, we train on programs of length 1 to 6 inclusive and test on programs of length 7 to 10. |
| Hardware Specification | Yes | Training took about 1 day for Robust Fill (or about 5 hours for Deep Coder) with 8 TPU v2 accelerators per model. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' and details model architecture, but it does not provide specific software library names with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We used an embedding dimension of 512, hidden dimension of 1024, 3 layers, and 4 attention heads. ... We train with the Adam optimizer with a learning rate of 2e-4 with linear warmup for 16,000 steps and square root decay, with a batch size of 128 and 500K training steps |