reproducibilityindex.ai

Synchronous Interactive Decoding for Multilingual Neural Machine Translation

Authors: Hao He, Qian Wang, Zhipeng Yu, Yang Zhao, Jiajun Zhang, Chengqing Zong12981-12988

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We take two target languages as an example to illustrate and evaluate the proposed Sim NMT model on IWSLT datasets. The experimental results demonstrate that our method achieves signiﬁcant improvements over several advanced NMT and MNMT models.
Researcher Affiliation	Collaboration	Hao He,1,2 Qian Wang,1,2 Zhipeng Yu,3 Yang Zhao,1,2 Jiajun Zhang,1,2 Chengqing Zong1,2 1National Laboratory of Pattern Recognition, CASIA, Beijing 100190, China 2School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China 3Beijing Fanyu Technology Co., Ltd, Beijing 100083, China
Pseudocode	No	The paper describes algorithms and processes but does not provide a formal pseudocode block or an explicitly labeled algorithm figure.
Open Source Code	No	The paper mentions modifying the 'tensor2tensor toolkit' and provides its GitHub link, but does not explicitly state that the authors' own implementation code for Sim NMT is open-sourced or provide a link to it.
Open Datasets	Yes	We evaluate our Sim NMT method on two translation tasks, including English to German/French (brieﬂy, En De/Fr) and English to Chinese/Japanese (brieﬂy, En Zh/Ja) on IWSLT datasets1.1https://wit3.fbk.eu/
Dataset Splits	Yes	The IWSLT.TED.tst2013 and IWSLT.TED.tst2015 are adopted as development set and test set, respectively.
Hardware Specification	Yes	The training and testing of all translation tasks are performed on single NVIDIA GTX 2080Ti GPU.
Software Dependencies	No	The paper mentions 'Tensor Flow' and 'tensor2tensor toolkit' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Speciﬁcally, we use 6 encoder and decoder layers Transformer with hidden size dmodel = 512, 8 attention heads, 2,048 feed-forward inner layer size and Pdropout = 0.1. The optimizer employs Adam method with parameters β1 = 0.9, β2 = 0.998 and ε = 10 9. We adopt the same warm-up and decay settings as Vaswani et al. (2017). For testing, we use beam search with beam size k = 8 and allocate 2 beams for each type of translation hypotheses (speciﬁcally for 2 target languages in our experiments) with length penalty α = 0.6.