Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation

Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao14310-14318

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on 10 different low-resource domains show that meta-curriculum learning can improve the translation performance of both familiar and unfamiliar domains.
Researcher Affiliation Academia Runzhe Zhan*, Xuebo Liu*, Derek F. Wong , Lidia S. Chao NLP2CT Lab, Department of Computer and Information Science, University of Macau nlp2ct.{runzhe,xuebo}@gmail.com, {derekfw,lidiasc}@um.edu.mo
Pseudocode Yes Algorithm 1 Meta-Curriculum Learning Policy
Open Source Code Yes All the codes and data are freely available at https://github.com/NLP2CT/ Meta-Curriculum.
Open Datasets Yes The dataset for domain adaptation is made up of ten parallel En-De corpora (Bible-uedin (Christodouloupoulos and Steedman 2015), Books, ECB, EMEA, Global Voices, JRCAcquis, KDE4, TED2013, WMT-News.v2019) which are publicly available at OPUS3 (Tiedemann 2012).
Dataset Splits Yes For each task T , the token amount of support set S and query set Q would be approximately limited to 8k and 16k, respectively. Table 1 shows the detailed statistics.
Hardware Specification No The paper discusses the software toolkit (fairseq) and optimizers used, but does not provide specific hardware details such as GPU or CPU models, or memory specifications for running experiments.
Software Dependencies No The paper mentions the use of 'fairseq' toolkit, 'Moses tokenizer', and 'sentencepieces', but does not provide specific version numbers for these software components.
Experiment Setup Yes Both of them were trained using Adam optimizer (Kingma and Ba 2015) (β1 = 0.9, β2 = 0.98), but with different learning rates (lrnlm = 5e 4, lrfinetune = 5e 5, lrtranslation = 7e 4, lrmeta = 1e 5). The learning rate scheduler and warm-up policy (nwarmup = 4000) for training the vanilla Transformer is the same as the Vaswani et al. (2017) work. Furthermore, the number of the updating epochs during the adaptation period would be strictly limited to 20 to simulate quick adaptation and verify the robustness under limited settings.