reproducibilityindex.ai

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

Authors: Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, Tong Zhang86-93

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement our algorithm on top of the state-of-the-art neural machine translation model TRANSFORMER and conduct experiments on the widely-used WMT14 English German and WMT17 Chinese English translation datasets. Experimental results across language pairs show that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.
Researcher Affiliation	Collaboration	Zi-Yi Dou Carnegie Mellon University zdou@andrew.cmu.edu; Zhaopeng Tu* Tencent AI Lab zptu@tencent.com; Xing Wang Tencent AI Lab brightxwang@tencent.com; Longyue Wang Tencent AI Lab vinnylywang@tencent.com; Shuming Shi Tencent AI Lab shumingshi@tencent.com; Tong Zhang Tencent AI Lab bradymzhang@tencent.com
Pseudocode	Yes	Algorithm 1 Iterative Dynamic Routing. ... Algorithm 2 Iterative EM Routing
Open Source Code	No	The paper does not contain an explicit statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We conducted experiments on two widely-used WMT14 English German (En De) and WMT17 Chinese English (Zh En) translation tasks and compared our model with results reported by previous work (Gehring et al. 2017; Vaswani et al. 2017; Hassan et al. 2018).
Dataset Splits	Yes	For the En De task, the training corpus consists of about 4.56 million sentence pairs. We used newstest2013 as the development set and newstest2014 as the test set. For the Zh En task, we used all of the available parallel data, consisting of about 20 million sentence pairs. We used newsdev2017 as the development set and newstest2017 as the test set.
Hardware Specification	Yes	All the models were trained on eight NVIDIA P40 GPUs where each was allocated with a batch size of 4096 tokens.
Software Dependencies	No	The paper mentions using 'byte-pair encoding' and evaluating on the 'Transformer model', but it does not specify any software dependencies with version numbers (e.g., PyTorch version, TensorFlow version).
Experiment Setup	Yes	We followed the conﬁgurations in (Vaswani et al. 2017), and reproduced their reported results on the En De task. ... All the models were trained on eight NVIDIA P40 GPUs where each was allocated with a batch size of 4096 tokens. ... The number of output capsules N is a key parameter for our model... Another key parameter is the iteration of the iterative routing T...