Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Authors: Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu7839-7846

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.
Researcher Affiliation Collaboration Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China Microsoft Research EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Fine-tuning by curriculum learning for NAT (FCL-NAT)
Open Source Code Yes We implement our model on Tensorflow3, and we have released our code4. 4https://github.com/lemmonation/fcl-nat
Open Datasets Yes We evaluate our method on four widely used benchmark datasets: IWSLT14 German to English translation (IWSLT14 De-En) and WMT14 English to German/German to English translation (WMT14 En-De/De En)2. 2https://www.statmt.org/wmt14/translation-task
Dataset Splits Yes Specifically, for the IWSLT14 De-En task, we have 153k/7k/7k parallel bilingual sentences in the training/dev/test sets respectively. WMT14 En-De/De-En has a much larger dataset which contains 4.5M training pairs, where newstest2013 and newstest2014 are used as the validation and test set respectively.
Hardware Specification Yes We train the NAT model on 8/1 Nvidia M40 GPUs for WMT/IWSLT datasets respectively... which is conducted on a single Nvidia P100 GPU to ensure a fair comparison with baselines (Gu et al. 2017; Wang et al. 2019; Guo et al. 2019).
Software Dependencies No The paper states 'We implement our model on Tensorflow3,' and footnote 3 links to https://github.com/tensorflow/tensor2tensor. However, it does not specify a version number for TensorFlow or any other software dependencies.
Experiment Setup Yes For WMT14 datasets, we use the hyperparameters of a base transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 datasets, we utilize smaller architectures (dmodel = dhidden = 256, nlayer = 5, nhead = 4) for IWSLT14... We set the beam size to be 4 for the teacher model... We set αM = 0.6 for all tasks... We set IAT = 55k, ICL = 1.0M, INAT = 0.5M in both settings.