Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information
Authors: Qiu Ran, Yankai Lin, Peng Li, Jie Zhou13727-13735
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on various widely-used datasets show that our proposed model achieves better performance compared to most existing NAT models, and even achieves comparable translation quality as autoregressive translation models with a significant speedup. |
| Researcher Affiliation | Industry | Qiu Ran , Yankai Lin , Peng Li , Jie Zhou Pattern Recognition Center, We Chat AI, Tencent Inc., China {soulcaptran,yankailin,patrickpli,withtomzhou}@tencent.com |
| Pseudocode | No | The paper describes the method using prose and a diagram, but it does not include pseudocode or an algorithm block. |
| Open Source Code | Yes | The source codes are available at https://github.com/ranqiu92/Reorder NAT. |
| Open Datasets | Yes | The main experiments are conducted on three widely-used machine translation tasks: WMT14 En-De (4.5M pairs), WMT16 En-Ro (610k pairs) and IWSLT16 En-De (196k pairs). ... We use the prepossessed corpus provided by Lee, Mansimov, and Cho (2018) at https://github.com/nyu-dl/dl4mt-nonauto/tree/multigpu. ... The training set consists of 1.25M sentence pairs extracted from the LDC corpora. |
| Dataset Splits | Yes | For WMT14 En-De task, we take newstest-2013 and newstest-2014 as validation and test sets respectively. For WMT16 En-Ro task, we employ newsdev-2016 and newstest-2016 as validation and test sets respectively. For IWSLT16 En-De task, we use test2013 for validation. ... We use NIST 2002 (MT02) as validation set, and NIST 2003 (MT03), 2004 (MT04), 2005 (MT05) as test sets. |
| Hardware Specification | Yes | We measure the model inference speedup on the validation set of IWSLT16 En-De task with a NVIDIA P40 GPU and set batch size to 1. |
| Software Dependencies | No | The paper mentions using the 'fast align tool' but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | For IWSLT16 En-De, we use a 5-layer Transformer model (dmodel = 278, dhidden = 507, nhead = 2, pdropout = 0.1) and anneal the learning rate linearly (from 3 × 10−4 to 10−5) as in (Lee, Mansimov, and Cho 2018). For WMT14 En-De, WMT16 En-Ro and Chinese-English translation, we use a 6-layer Transformer model (dmodel = 512, dhidden = 512, nhead = 8, pdropout = 0.1) and adopt the warm-up learning rate schedule (Vaswani et al. 2017) with twarmup = 4000. For the GRU reordering module, we set it to have the same hidden size with the Transformer model in each dataset. We employ label smoothing of value ϵls = 0.15 and utilize the sequence-level knowledge distillation (Kim and Rush 2016). We also set T in Eq. 10 to 0.2 according to a grid search on the validation set. We set the beam size to 4 in the experiments. |