Revisiting Iterative Back-Translation from the Perspective of Compositional Generalization
Authors: Yinuo Guo, Hualei Zhu, Zeqi Lin, Bei Chen, Jian-Guang Lou, Dongmei Zhang7601-7609
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first empirically show that iterative back-translation substantially improves the performance on compositional generalization benchmarks (CFQ and SCAN). ... Our experiments are also conducted on these two benchmarks. ... We report the experimental results in Table 1 (SCAN) and 2 (CFQ). |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Computational Linguistics, School of EECS, Peking University, 2School of Computer Science and Engineering, Beihang University 3Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1 Iterative Back-Translation ... Algorithm 2 Curriculum Iterative Back-Translation |
| Open Source Code | No | The paper states: 'We implement iterative backtranslation based on the code of UNdrea MT1 (Artetxe et al. 2018).' and provides a link to the UNdrea MT1 GitHub repository (https://github.com/artetxem/undreamt). This is code for a third-party tool they used, not their own specific open-source implementation of the methodology described in this paper. |
| Open Datasets | Yes | Two benchmarks, SCAN (Lake and Baroni 2018) and CFQ (Keysers et al. 2020), have been proposed for measuring the compositional generalization ability of different machine learning-based NLU systems. Our experiments are also conducted on these two benchmarks. |
| Dataset Splits | No | For SCAN, the paper states: 'we randomly hold out half of the original test data as the dev data'. For CFQ, it states 'CFQ dataset is splitted into train and test sets based on two principles' but does not specify a distinct validation (dev) split for CFQ data used in their experiments. Therefore, complete, explicit train/test/validation splits are not provided for all experimental datasets. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions that they implemented their work 'based on the code of UNdrea MT1', but it does not specify any version numbers for core software dependencies like Python, PyTorch/TensorFlow, CUDA, or other relevant libraries used in their experimental setup. |
| Experiment Setup | Yes | For CFQ, we use a 2-layer GRU encoder-decoder model equipped with attention. We set the size of both word embedding and hidden states to 300. We use a dropout layer with the rate of 0.5 and the training process lasts 30000 iterations with batch size 128. For SCAN, we also use a 2-layer GRU encoder-decoder model with attention. Both of the embedding size and hidden size are set to 200. We use a dropout layer with the rate of 0.5 and the training process lasts 35000 iterations with batch size 64. We set K = 5000 steps in Algorithm 1 for both CFQ and SCAN benchmarks. ... We evaluate the performance of curriculum iterative back-translation with varying c (the number of steps each stage should be trained): 2000/2500/3000/3500/4000. |