Synchronous Interactive Decoding for Multilingual Neural Machine Translation

Authors: Hao He, Qian Wang, Zhipeng Yu, Yang Zhao, Jiajun Zhang, Chengqing Zong12981-12988

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We take two target languages as an example to illustrate and evaluate the proposed Sim NMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and MNMT models.
Researcher Affiliation Collaboration Hao He,1,2 Qian Wang,1,2 Zhipeng Yu,3 Yang Zhao,1,2 Jiajun Zhang,1,2 Chengqing Zong1,2 1National Laboratory of Pattern Recognition, CASIA, Beijing 100190, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 3Beijing Fanyu Technology Co., Ltd, Beijing 100083, China
Pseudocode No The paper describes algorithms and processes but does not provide a formal pseudocode block or an explicitly labeled algorithm figure.
Open Source Code No The paper mentions modifying the 'tensor2tensor toolkit' and provides its GitHub link, but does not explicitly state that the authors' own implementation code for Sim NMT is open-sourced or provide a link to it.
Open Datasets Yes We evaluate our Sim NMT method on two translation tasks, including English to German/French (briefly, En De/Fr) and English to Chinese/Japanese (briefly, En Zh/Ja) on IWSLT datasets1.1https://wit3.fbk.eu/
Dataset Splits Yes The IWSLT.TED.tst2013 and IWSLT.TED.tst2015 are adopted as development set and test set, respectively.
Hardware Specification Yes The training and testing of all translation tasks are performed on single NVIDIA GTX 2080Ti GPU.
Software Dependencies No The paper mentions 'Tensor Flow' and 'tensor2tensor toolkit' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Specifically, we use 6 encoder and decoder layers Transformer with hidden size dmodel = 512, 8 attention heads, 2,048 feed-forward inner layer size and Pdropout = 0.1. The optimizer employs Adam method with parameters β1 = 0.9, β2 = 0.998 and ε = 10 9. We adopt the same warm-up and decay settings as Vaswani et al. (2017). For testing, we use beam search with beam size k = 8 and allocate 2 beams for each type of translation hypotheses (specifically for 2 target languages in our experiments) with length penalty α = 0.6.