reproducibilityindex.ai

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

Authors: Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on IWSLT14 De En, IWSLT16 En-De, WMT14 En-De and De-En datasets show that TCL-NAT achieves signiﬁcant accuracy improvements over previous NAT baselines and reduces the performance gap between NAT and AT models to 1-2 BLEU points, demonstrating the effectiveness of our proposed method.
Researcher Affiliation	Collaboration	1Zhejiang University 2Microsoft Research Asia {jinglinliu,rayeren,zc99,zhaozhou}@zju.edu.cn, {xuta,taoqin,tyliu}@microsoft.com
Pseudocode	No	The paper describes its model architecture and process with textual descriptions and a figure (Figure 1), but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for their method.
Open Datasets	Yes	We evaluate our method on three standard translation datasets: IWSLT14 German-to-English (De-En) dataset2, IWSLT16 English-to-German (En-De) dataset3 and WMT14 English-to-German (En-De) dataset4. Following Li et al. [2019], we reverse WMT14 English-to-German to get WMT14 German-to-English (De-En) dataset.
Dataset Splits	Yes	IWSLT14 dataset contains 153k/7k/7k parallel bilingual sentences for training/dev/test set respectively; IWSLT16 dataset contains 195k/1k/1k parallel bilingual sentences for training/dev/test set and WMT14 dataset contains 4.5M parallel sentence pairs for training sets, where newstest2014 and newstest2013 are used as test and validation set respectively, following previous works [Gu et al., 2018; Guo et al., 2019b].
Hardware Specification	Yes	We run the training procedure on 8 NVIDIA Tesla P100 GPUs for WMT and 2 NVIDIA 2080Ti GPUs for IWSLT datasets respectively.
Software Dependencies	No	The paper states, 'We implement our model on Tensor2Tensor [Vaswani et al., 2018].' However, it does not specify the version number of Tensor2Tensor or any other software dependencies with version numbers.
Experiment Setup	Yes	We follow Guo et al. [2019b] for conﬁguration hyperparameters: For WMT14 datasets, we use the hyperparameters of a base Transformer (dmodel = dhidden = 512, nlayer = 6, nhead = 8). For IWSLT14 and IWSLT16 datasets, we utilize a small Transformer (dmodel = dhidden = 256, nlayer = 6, nhead = 4). ... We train all models using Adam following the optimizer settings and learning rate schedule in Transformer [Vaswani et al., 2017]. ... The training steps of each phase are listed in Table 2.