reproducibilityindex.ai

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Authors: Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu7839-7846

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.
Researcher Affiliation	Collaboration	Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China Microsoft Research guojunll@mail.ustc.edu.cn, {linlixu, cheneh}@ustc.edu.cn, {xuta, taoqin, tyliu}@microsoft.com
Pseudocode	Yes	Algorithm 1: Fine-tuning by curriculum learning for NAT (FCL-NAT)
Open Source Code	Yes	We implement our model on Tensorﬂow3, and we have released our code4. 4https://github.com/lemmonation/fcl-nat
Open Datasets	Yes	We evaluate our method on four widely used benchmark datasets: IWSLT14 German to English translation (IWSLT14 De-En) and WMT14 English to German/German to English translation (WMT14 En-De/De En)2. 2https://www.statmt.org/wmt14/translation-task
Dataset Splits	Yes	Speciﬁcally, for the IWSLT14 De-En task, we have 153k/7k/7k parallel bilingual sentences in the training/dev/test sets respectively. WMT14 En-De/De-En has a much larger dataset which contains 4.5M training pairs, where newstest2013 and newstest2014 are used as the validation and test set respectively.
Hardware Specification	Yes	We train the NAT model on 8/1 Nvidia M40 GPUs for WMT/IWSLT datasets respectively... which is conducted on a single Nvidia P100 GPU to ensure a fair comparison with baselines (Gu et al. 2017; Wang et al. 2019; Guo et al. 2019).
Software Dependencies	No	The paper states 'We implement our model on Tensorﬂow3,' and footnote 3 links to https://github.com/tensorﬂow/tensor2tensor. However, it does not specify a version number for TensorFlow or any other software dependencies.
Experiment Setup	Yes	For WMT14 datasets, we use the hyperparameters of a base transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 datasets, we utilize smaller architectures (dmodel = dhidden = 256, nlayer = 5, nhead = 4) for IWSLT14... We set the beam size to be 4 for the teacher model... We set αM = 0.6 for all tasks... We set IAT = 55k, ICL = 1.0M, INAT = 0.5M in both settings.