reproducibilityindex.ai

Multi-Channel Encoder for Neural Machine Translation

Authors: Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical study on Chinese-English translation shows that our model can improve by 6.52 BLEU points upon a strong open source NMT system: DL4MT. On the WMT14 English French task, our single shallow system achieves BLEU=38.8, comparable with the state-of-the-art deep models.
Researcher Affiliation	Industry	Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu Baidu Inc. No. 10, Shangdi 10th Street, Beijing, 100085, China {xionghao05, hezhongjun, huxiaoguang, wu hua}@baidu.com
Pseudocode	No	The paper describes methods using equations and diagrams (Figure 2) but does not contain a formal pseudocode block or algorithm.
Open Source Code	No	On page 3, footnote 1 and 5/6, it lists URLs for DL4MT, T2T, and Conv S2S, which are open-source toolkits used for comparison, not the authors' own implementation code for MCE. There is no explicit statement or link for the MCE implementation.
Open Datasets	Yes	We use a subset of the data available for NIST Open MT08 task and WMT 14 parallel corpus as our training data. The detailed data sets are Europarl v7, Common Crawl, UN, News Commentary, Gigaword.
Dataset Splits	Yes	For Chinese-English task, We choose NIST 2006 (NIST06) dataset as our development set, and the NIST 2003 (NIST03), 2004 (NIST04) 2005 (NIST05), 2008 (NIST08) and 2012 (NIST12) datasets as our test sets. For English-French task, The news-test-2012 and news-test-2013 are concatenated as our development set, and the news-test-2014 is the test set.
Hardware Specification	Yes	As we set the batch size to 128, on Chinese-English task it takes around 1 day to train the basic model on 8 NIVDIA P40 GPUs and on English-French task it takes around 7 days.
Software Dependencies	No	For the Chinese-English task, we run widely used open source toolkit DL4MT together with two recently published strong open source toolkits T2T and Conv S2S on the same experimental settings to validate the performance of our models. Beyond that, we also reimplement an attention-based NMT written in tensorﬂow as our baseline system.
Experiment Setup	Yes	we use 512 dimensional word embeddings for both the source and target languages. All hidden layers both in the encoder and the decoder, have 512 memory cells. The output layer size is the same as the hidden size. The dimension of cj is 1024. ... we apply gradient clipping: ... 1.0 in our case. ... we use the Adam optimizer with β1 = 0.9, β2 = 0.98 and ϵ = 10 9. ... we set the batch size to 128... And we use a beam width of 10 in all the experiments. ... we set the dropout rate to 0.5.