Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Authors: Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu7839-7846

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.
Researcher Affiliation Collaboration Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China Microsoft Research guojunll@mail.ustc.edu.cn, {linlixu, cheneh}@ustc.edu.cn, {xuta, taoqin, tyliu}@microsoft.com
Pseudocode Yes Algorithm 1: Fine-tuning by curriculum learning for NAT (FCL-NAT)
Open Source Code Yes We implement our model on Tensorflow3, and we have released our code4. 4https://github.com/lemmonation/fcl-nat
Open Datasets Yes We evaluate our method on four widely used benchmark datasets: IWSLT14 German to English translation (IWSLT14 De-En) and WMT14 English to German/German to English translation (WMT14 En-De/De En)2. 2https://www.statmt.org/wmt14/translation-task
Dataset Splits Yes Specifically, for the IWSLT14 De-En task, we have 153k/7k/7k parallel bilingual sentences in the training/dev/test sets respectively. WMT14 En-De/De-En has a much larger dataset which contains 4.5M training pairs, where newstest2013 and newstest2014 are used as the validation and test set respectively.
Hardware Specification Yes We train the NAT model on 8/1 Nvidia M40 GPUs for WMT/IWSLT datasets respectively... which is conducted on a single Nvidia P100 GPU to ensure a fair comparison with baselines (Gu et al. 2017; Wang et al. 2019; Guo et al. 2019).
Software Dependencies No The paper states 'We implement our model on Tensorflow3,' and footnote 3 links to https://github.com/tensorflow/tensor2tensor. However, it does not specify a version number for TensorFlow or any other software dependencies.
Experiment Setup Yes For WMT14 datasets, we use the hyperparameters of a base transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 datasets, we utilize smaller architectures (dmodel = dhidden = 256, nlayer = 5, nhead = 4) for IWSLT14... We set the beam size to be 4 for the teacher model... We set αM = 0.6 for all tasks... We set IAT = 55k, ICL = 1.0M, INAT = 0.5M in both settings.