Revisiting Iterative Back-Translation from the Perspective of Compositional Generalization

Authors: Yinuo Guo, Hualei Zhu, Zeqi Lin, Bei Chen, Jian-Guang Lou, Dongmei Zhang7601-7609

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first empirically show that iterative back-translation substantially improves the performance on compositional generalization benchmarks (CFQ and SCAN). ... Our experiments are also conducted on these two benchmarks. ... We report the experimental results in Table 1 (SCAN) and 2 (CFQ).
Researcher Affiliation Collaboration 1Key Laboratory of Computational Linguistics, School of EECS, Peking University, 2School of Computer Science and Engineering, Beihang University 3Microsoft Research Asia
Pseudocode Yes Algorithm 1 Iterative Back-Translation ... Algorithm 2 Curriculum Iterative Back-Translation
Open Source Code No The paper states: 'We implement iterative backtranslation based on the code of UNdrea MT1 (Artetxe et al. 2018).' and provides a link to the UNdrea MT1 GitHub repository (https://github.com/artetxem/undreamt). This is code for a third-party tool they used, not their own specific open-source implementation of the methodology described in this paper.
Open Datasets Yes Two benchmarks, SCAN (Lake and Baroni 2018) and CFQ (Keysers et al. 2020), have been proposed for measuring the compositional generalization ability of different machine learning-based NLU systems. Our experiments are also conducted on these two benchmarks.
Dataset Splits No For SCAN, the paper states: 'we randomly hold out half of the original test data as the dev data'. For CFQ, it states 'CFQ dataset is splitted into train and test sets based on two principles' but does not specify a distinct validation (dev) split for CFQ data used in their experiments. Therefore, complete, explicit train/test/validation splits are not provided for all experimental datasets.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies No The paper mentions that they implemented their work 'based on the code of UNdrea MT1', but it does not specify any version numbers for core software dependencies like Python, PyTorch/TensorFlow, CUDA, or other relevant libraries used in their experimental setup.
Experiment Setup Yes For CFQ, we use a 2-layer GRU encoder-decoder model equipped with attention. We set the size of both word embedding and hidden states to 300. We use a dropout layer with the rate of 0.5 and the training process lasts 30000 iterations with batch size 128. For SCAN, we also use a 2-layer GRU encoder-decoder model with attention. Both of the embedding size and hidden size are set to 200. We use a dropout layer with the rate of 0.5 and the training process lasts 35000 iterations with batch size 64. We set K = 5000 steps in Algorithm 1 for both CFQ and SCAN benchmarks. ... We evaluate the performance of curriculum iterative back-translation with varying c (the number of steps each stage should be trained): 2000/2500/3000/3500/4000.