Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

Authors: Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on IWSLT14 De En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baselines and reduces the performance gap between NAT and AT models to 1-2 BLEU points, demonstrating the effectiveness of our proposed method.
Researcher Affiliation Collaboration 1Zhejiang University 2Microsoft Research Asia {jinglinliu,rayeren,zc99,zhaozhou}@zju.edu.cn, {xuta,taoqin,tyliu}@microsoft.com
Pseudocode No The paper describes its model architecture and process with textual descriptions and a figure (Figure 1), but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for their method.
Open Datasets Yes We evaluate our method on three standard translation datasets: IWSLT14 German-to-English (De-En) dataset2, IWSLT16 English-to-German (En-De) dataset3 and WMT14 English-to-German (En-De) dataset4. Following Li et al. [2019], we reverse WMT14 English-to-German to get WMT14 German-to-English (De-En) dataset.
Dataset Splits Yes IWSLT14 dataset contains 153k/7k/7k parallel bilingual sentences for training/dev/test set respectively; IWSLT16 dataset contains 195k/1k/1k parallel bilingual sentences for training/dev/test set and WMT14 dataset contains 4.5M parallel sentence pairs for training sets, where newstest2014 and newstest2013 are used as test and validation set respectively, following previous works [Gu et al., 2018; Guo et al., 2019b].
Hardware Specification Yes We run the training procedure on 8 NVIDIA Tesla P100 GPUs for WMT and 2 NVIDIA 2080Ti GPUs for IWSLT datasets respectively.
Software Dependencies No The paper states, 'We implement our model on Tensor2Tensor [Vaswani et al., 2018].' However, it does not specify the version number of Tensor2Tensor or any other software dependencies with version numbers.
Experiment Setup Yes We follow Guo et al. [2019b] for configuration hyperparameters: For WMT14 datasets, we use the hyperparameters of a base Transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 and IWSLT16 datasets, we utilize a small Transformer (dmodel = dhidden = 256, nlayer = 6, nhead = 4). ... We train all models using Adam following the optimizer settings and learning rate schedule in Transformer [Vaswani et al., 2017]. ... The training steps of each phase are listed in Table 2.