Synchronous Interactive Decoding for Multilingual Neural Machine Translation
Authors: Hao He, Qian Wang, Zhipeng Yu, Yang Zhao, Jiajun Zhang, Chengqing Zong12981-12988
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We take two target languages as an example to illustrate and evaluate the proposed Sim NMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and MNMT models. |
| Researcher Affiliation | Collaboration | Hao He,1,2 Qian Wang,1,2 Zhipeng Yu,3 Yang Zhao,1,2 Jiajun Zhang,1,2 Chengqing Zong1,2 1National Laboratory of Pattern Recognition, CASIA, Beijing 100190, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 3Beijing Fanyu Technology Co., Ltd, Beijing 100083, China |
| Pseudocode | No | The paper describes algorithms and processes but does not provide a formal pseudocode block or an explicitly labeled algorithm figure. |
| Open Source Code | No | The paper mentions modifying the 'tensor2tensor toolkit' and provides its GitHub link, but does not explicitly state that the authors' own implementation code for Sim NMT is open-sourced or provide a link to it. |
| Open Datasets | Yes | We evaluate our Sim NMT method on two translation tasks, including English to German/French (briefly, En De/Fr) and English to Chinese/Japanese (briefly, En Zh/Ja) on IWSLT datasets1.1https://wit3.fbk.eu/ |
| Dataset Splits | Yes | The IWSLT.TED.tst2013 and IWSLT.TED.tst2015 are adopted as development set and test set, respectively. |
| Hardware Specification | Yes | The training and testing of all translation tasks are performed on single NVIDIA GTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' and 'tensor2tensor toolkit' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we use 6 encoder and decoder layers Transformer with hidden size dmodel = 512, 8 attention heads, 2,048 feed-forward inner layer size and Pdropout = 0.1. The optimizer employs Adam method with parameters β1 = 0.9, β2 = 0.998 and ε = 10 9. We adopt the same warm-up and decay settings as Vaswani et al. (2017). For testing, we use beam search with beam size k = 8 and allocate 2 beams for each type of translation hypotheses (specifically for 2 target languages in our experiments) with length penalty α = 0.6. |