reproducibilityindex.ai

Transductive Ensemble Learning for Neural Machine Translation

Authors: Yiren Wang, Lijun Wu, Yingce Xia, Tao Qin, ChengXiang Zhai, Tie-Yan Liu6291-6298

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on different settings (with/without monolingual data) and different language pairs (English {German, Finnish}). The results show that our approach boosts strong individual models with signiﬁcant improvement and beneﬁts a lot from more individual models. Speciﬁcally, we achieve the state-of-the-art performances on the WMT2016-2018 English German translations.
Researcher Affiliation	Collaboration	1University of Illinois at Urbana-Champaign 2School of Data and Computer Science, Sun Yat-sen University 3Microsoft Research Asia 1{yiren, czhai}@illinois.edu 2wulijun3@mail2.sysu.edu.cn 3{Yingce.Xia, taoqin, tyliu}@microsoft.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions that "The experiments are based on the Py Torch implementation of Transformer" and provides a link to the fairseq GitHub repository (https://github.com/pytorch/fairseq). However, this is a third-party framework used for implementation, not the specific source code for the proposed Transductive Ensemble Learning (TEL) method by the authors.
Open Datasets	Yes	The majority of our empirical studies are conducted on the WMT2019 English German (En De) and German English (De En) news translation tasks. We use 5M bitext as our training data3...We also experiment on another two more translation tasks, WMT2019 English Finnish (En Fi) and Finnish English (Fi En) news translations...
Dataset Splits	Yes	We use Newstest2015 as the validation set for model selection.
Hardware Specification	Yes	The models are trained on 8 M40 GPUs with a batch size of 4096.
Software Dependencies	No	The paper states, "The experiments are based on the Py Torch implementation of Transformer." However, it does not specify a version number for PyTorch or any other software libraries used.
Experiment Setup	Yes	The dimensions of word embeddings, hidden states and non-linear layer are set as 1024, 1024 and 4096 respectively, and the number of heads for multi-head attention is set as 16. The dropout is 0.3 for both En De and En Fi. All models are optimized with Adam (Kingma and Ba 2015) following the optimizer settings and learning rate schedule in (Vaswani et al. 2017). The models are trained on 8 M40 GPUs with a batch size of 4096.