Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation
Authors: Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao14310-14318
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on 10 different low-resource domains show that meta-curriculum learning can improve the translation performance of both familiar and unfamiliar domains. |
| Researcher Affiliation | Academia | Runzhe Zhan*, Xuebo Liu*, Derek F. Wong , Lidia S. Chao NLP2CT Lab, Department of Computer and Information Science, University of Macau nlp2ct.{runzhe,xuebo}@gmail.com, {derekfw,lidiasc}@um.edu.mo |
| Pseudocode | Yes | Algorithm 1 Meta-Curriculum Learning Policy |
| Open Source Code | Yes | All the codes and data are freely available at https://github.com/NLP2CT/ Meta-Curriculum. |
| Open Datasets | Yes | The dataset for domain adaptation is made up of ten parallel En-De corpora (Bible-uedin (Christodouloupoulos and Steedman 2015), Books, ECB, EMEA, Global Voices, JRCAcquis, KDE4, TED2013, WMT-News.v2019) which are publicly available at OPUS3 (Tiedemann 2012). |
| Dataset Splits | Yes | For each task T , the token amount of support set S and query set Q would be approximately limited to 8k and 16k, respectively. Table 1 shows the detailed statistics. |
| Hardware Specification | No | The paper discusses the software toolkit (fairseq) and optimizers used, but does not provide specific hardware details such as GPU or CPU models, or memory specifications for running experiments. |
| Software Dependencies | No | The paper mentions the use of 'fairseq' toolkit, 'Moses tokenizer', and 'sentencepieces', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Both of them were trained using Adam optimizer (Kingma and Ba 2015) (β1 = 0.9, β2 = 0.98), but with different learning rates (lrnlm = 5e 4, lrfinetune = 5e 5, lrtranslation = 7e 4, lrmeta = 1e 5). The learning rate scheduler and warm-up policy (nwarmup = 4000) for training the vanilla Transformer is the same as the Vaswani et al. (2017) work. Furthermore, the number of the updating epochs during the adaptation period would be strictly limited to 20 to simulate quick adaptation and verify the robustness under limited settings. |