Compositional Generalization by Learning Analytical Expressions
Authors: Qian Liu, Shengnan An, Jian-Guang Lou, Bei Chen, Zeqi Lin, Yan Gao, Bin Zhou, Nanning Zheng, Dongmei Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies. |
| Researcher Affiliation | Collaboration | Beihang University, Beijing, China; Xi an Jiaotong University, Xi an, China; Microsoft Research, Beijing, China |
| Pseudocode | No | The paper describes the model’s processes (Composer and Solver) with textual explanations and figures, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We open-source our code at https://github.com/microsoft/Contextual SP. |
| Open Datasets | Yes | As one of the most important benchmarks, the SCAN dataset is proposed to evaluate the compositional generalization ability of translation models [19]. Systematicity is evaluated on Add Jump, Around Right and Length of SCAN [19], while distribution-based systematicity is assessed on MCD splits of SCAN [17]. Productivity is evaluated on the SCAN-ext dataset. |
| Dataset Splits | No | The paper states, “We follow previous works to split datasets for all tasks,” implying standard splits are used, but it does not explicitly provide percentages or sample counts for a validation set in the main text. It only details train/test splits for specific tasks like Add Jump. |
| Hardware Specification | Yes | Our model is trained on a single Tesla-P100 (16GB) and the training time for a single run is about 20 25 hours. |
| Software Dependencies | No | The paper mentions “Our model is implemented in Py Torch [28]” and “updated via the Ada Delta [40] optimizer,” but it does not provide specific version numbers for PyTorch or Ada Delta. |
| Experiment Setup | Yes | Dimensions of word embeddings, hidden states, key vectors and value vectors are set as 128. Hyperparameters γ and N are set as 0.5 and 10 respectively. All parameters are randomly initialized and updated via the Ada Delta [40] optimizer, with a learning rate of 0.1 for Composer and 1.0 for Solver. Meanwhile, as done in previous works [14], we introduce a regularization term to prevent our model from overfitting in the early stage of training. Its weight is set to 0.1 at the beginning, and exponentially anneals with a rate 0.5 as the lesson increases. |