Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation
Authors: Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on IWSLT14 De En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves significant accuracy improvements over previous NAT baselines and reduces the performance gap between NAT and AT models to 1-2 BLEU points, demonstrating the effectiveness of our proposed method. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Microsoft Research Asia {jinglinliu,rayeren,zc99,zhaozhou}@zju.edu.cn, {xuta,taoqin,tyliu}@microsoft.com |
| Pseudocode | No | The paper describes its model architecture and process with textual descriptions and a figure (Figure 1), but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for their method. |
| Open Datasets | Yes | We evaluate our method on three standard translation datasets: IWSLT14 German-to-English (De-En) dataset2, IWSLT16 English-to-German (En-De) dataset3 and WMT14 English-to-German (En-De) dataset4. Following Li et al. [2019], we reverse WMT14 English-to-German to get WMT14 German-to-English (De-En) dataset. |
| Dataset Splits | Yes | IWSLT14 dataset contains 153k/7k/7k parallel bilingual sentences for training/dev/test set respectively; IWSLT16 dataset contains 195k/1k/1k parallel bilingual sentences for training/dev/test set and WMT14 dataset contains 4.5M parallel sentence pairs for training sets, where newstest2014 and newstest2013 are used as test and validation set respectively, following previous works [Gu et al., 2018; Guo et al., 2019b]. |
| Hardware Specification | Yes | We run the training procedure on 8 NVIDIA Tesla P100 GPUs for WMT and 2 NVIDIA 2080Ti GPUs for IWSLT datasets respectively. |
| Software Dependencies | No | The paper states, 'We implement our model on Tensor2Tensor [Vaswani et al., 2018].' However, it does not specify the version number of Tensor2Tensor or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We follow Guo et al. [2019b] for configuration hyperparameters: For WMT14 datasets, we use the hyperparameters of a base Transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 and IWSLT16 datasets, we utilize a small Transformer (dmodel = dhidden = 256, nlayer = 6, nhead = 4). ... We train all models using Adam following the optimizer settings and learning rate schedule in Transformer [Vaswani et al., 2017]. ... The training steps of each phase are listed in Table 2. |