Compositional Generalization via Neural-Symbolic Stack Machines
Authors: Xinyun Chen, Chen Liang, Adams Wei Yu, Dawn Song, Denny Zhou
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Ne SS on four benchmarks that require compositional generalization: (1) the SCAN benchmark discussed above; (2) the task of few-shot learning of compositional instructions [28]; (3) the compositional machine translation task [26]; and (4) the context-free grammar parsing tasks [8]. |
| Researcher Affiliation | Collaboration | Xinyun Chen UC Berkeley xinyun.chen@berkeley.edu; Chen Liang, Adams Wei Yu Google Brain {crazydonkey,adamsyuwei}@google.com; Dawn Song UC Berkeley dawnsong@cs.berkeley.edu; Denny Zhou Google Brain dennyzhou@google.com |
| Pseudocode | No | The paper describes the instruction semantics of the stack machine in Table 1 and provides an illustrative example in Figure 1, but it does not include a formal pseudocode block or algorithm block for the overall Ne SS system or its training procedure. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code for its methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluate Ne SS on four benchmarks that require compositional generalization: (1) the SCAN benchmark discussed above; (2) the task of few-shot learning of compositional instructions [28]; (3) the compositional machine translation task [26]; and (4) the context-free grammar parsing tasks [8]. |
| Dataset Splits | Yes | Evaluation setup. Similar to prior work [27, 16, 38], we evaluate the following four settings. (1) Length generalization: the output sequences in the training set include at most 22 actions, while the output lengths in the test set are between 24 and 48. (4) Simple split: randomly split samples into training and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific cloud instance types used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, or other library versions) that were used in the experiments. |
| Experiment Setup | No | The paper states, 'We present the setup and key results below, and defer more experimental details to the supplementary material,' indicating that detailed experimental setup information, such as hyperparameters, is not included in the main text provided. |