Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
Authors: Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu7839-7846
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines. |
| Researcher Affiliation | Collaboration | Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China Microsoft Research guojunll@mail.ustc.edu.cn, {linlixu, cheneh}@ustc.edu.cn, {xuta, taoqin, tyliu}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Fine-tuning by curriculum learning for NAT (FCL-NAT) |
| Open Source Code | Yes | We implement our model on Tensorflow3, and we have released our code4. 4https://github.com/lemmonation/fcl-nat |
| Open Datasets | Yes | We evaluate our method on four widely used benchmark datasets: IWSLT14 German to English translation (IWSLT14 De-En) and WMT14 English to German/German to English translation (WMT14 En-De/De En)2. 2https://www.statmt.org/wmt14/translation-task |
| Dataset Splits | Yes | Specifically, for the IWSLT14 De-En task, we have 153k/7k/7k parallel bilingual sentences in the training/dev/test sets respectively. WMT14 En-De/De-En has a much larger dataset which contains 4.5M training pairs, where newstest2013 and newstest2014 are used as the validation and test set respectively. |
| Hardware Specification | Yes | We train the NAT model on 8/1 Nvidia M40 GPUs for WMT/IWSLT datasets respectively... which is conducted on a single Nvidia P100 GPU to ensure a fair comparison with baselines (Gu et al. 2017; Wang et al. 2019; Guo et al. 2019). |
| Software Dependencies | No | The paper states 'We implement our model on Tensorflow3,' and footnote 3 links to https://github.com/tensorflow/tensor2tensor. However, it does not specify a version number for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | For WMT14 datasets, we use the hyperparameters of a base transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 datasets, we utilize smaller architectures (dmodel = dhidden = 256, nlayer = 5, nhead = 4) for IWSLT14... We set the beam size to be 4 for the teacher model... We set αM = 0.6 for all tasks... We set IAT = 55k, ICL = 1.0M, INAT = 0.5M in both settings. |