Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
Authors: Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu3723-3730
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show our method largely outperforms the NAT baseline (Gu et al. 2017) by 5.11 BLEU scores on WMT14 English-German task and 4.72 BLEU scores on WMT16 English-Romanian task. We conduct experiments on three tasks to verify the proposed method. |
| Researcher Affiliation | Collaboration | Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China Microsoft Research Key Laboratory of Machine Perception (MOE), School of EECS, Peking University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We evaluate our model on three widely used public machine translation datasets: IWSLT14 De-En1, WMT14 En De2 and WMT16 En-Ro3, which has 153K/4.5M/2.9M bilingual sentence pairs in corresponding training sets. 1https://wit3.fbk.eu/ 2https://www.statmt.org/wmt14/translation-task 3https://www.statmt.org/wmt16/translation-task |
| Dataset Splits | Yes | For WMT14 tasks, newstest2013 and newstest2014 are used as the validation and test set respectively. For the WMT16 En-Ro task, newsdev2016 is the validation set and newstest2016 is used as the test set. For IWSLT14 De-En, we use 7K data split from the training set as the validation set and use the concatenation of dev2010, tst2010, tst2011 and tst2012 as the test set, which is widely used in prior works (Ranzato et al. 2015; Bahdanau et al. 2016). |
| Hardware Specification | Yes | Models on WMT/IWSLT tasks are trained on 8/1 NVIDIA M40 GPUs respectively. ... The average per-sentence decoding latency on WMT14 En-De task over the newstest2014 test set is also reported, which is conducted on a single NVIDIA P100 GPU to keep consistent with NART (Gu et al. 2017). ... This procedure only brings 0.14ms latency per sentence on average over the newstest2014 test set on an Intel Xeon E5-2690 CPU |
| Software Dependencies | No | The paper mentions using TensorFlow (Abadi et al. 2016) and Moses (Koehn et al. 2007) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We follow the same encoder and decoder architecture as Transformer (Vaswani et al. 2017). ... For WMT14 and WMT16 datasets, we use the default hyperparameters of the base model described in Vaswani et al. (2017), whose encoder and decoder both have 6 layers and the size of hidden state and embeddings are set to 512, and the number of heads is set to 8. As IWSLT14 is a smaller dataset, we choose to a smaller architecture as well, which consists of a 5-layer encoder and a 5-layer decoder. The size of hidden state and embeddings are set to 256, and the number of heads is set to 4. ... We follow the optimizer settings in Vaswani et al. (2017). ... We set µ = 0.1 and λ = 1.0 in Equation (10) for all tasks... The beam size while decoding is set to 4. ... α is set to 1.1 for English-to-Others tasks and 0.9 for Others-to-English tasks, and we try both B = 0 and B = 4 which result in 1 and 9 candidates. |