reproducibilityindex.ai

Modeling Coherence for Discourse Neural Machine Translation

Authors: Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang7338-7345

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Practical results on multiple discourse test datasets indicate that our model signiﬁcantly improves the translation quality over the state-of-the-art baseline system by +1.23 BLEU score. Moreover, our model generates more discourse coherent text and obtains +2.2 BLEU improvements when evaluated by discourse metrics.
Researcher Affiliation	Industry	Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang Baidu Inc. No. 10, Shangdi 10th Street, Beijing, 100085, China {xionghao05, hezhongjun, wu hua, wanghaifeng} @baidu.com
Pseudocode	No	The paper describes its model architecture and training procedures but does not include any pseudocode or explicitly labeled algorithm blocks.
Open Source Code	No	The paper refers to third-party open-source toolkits like "t2t" and "Moses Toolkit" (Footnote 2), but does not state that the authors are releasing their own code for the proposed method.
Open Datasets	Yes	We evaluate the performance of our model on the IWSLT speech translation task with TED talks (Cettolo, Girardi, and Federico 2012) as training corpus, which includes multiple entire talks.
Dataset Splits	Yes	Specifically, we take the dev-2010 as our development set, and tst-2013 2015 as our test sets. Statistically, we have 14,258 talks and 231,266 sentences in the training data, 48 talks and 879 sentences in the development set, and 234 talks and 3,874 sentences in the test sets.
Hardware Specification	Yes	The training speed of two-pass-bleu-rl model is 8 talks per one second running on V100 with 8GPUs, and it needs about 1.5 days to converge.
Software Dependencies	Yes	t2t: This is the ofﬁcial supplied open source toolkit for running Transformer model. Speciﬁcally, we use the v1.6.5 release.
Experiment Setup	Yes	For all systems, we use the Adam Optimizer (Kingma and Ba 2015) with the identical settings to t2t, to tune the parameters. One thing deserves to be noted is the value of hyperparameter batch size. In general, a large value of batch size achieves better performance when training on large scale corpus (more than millions) (Vaswani et al. 2017). Thus we set the batch size to 320 for t2t system... we set both the embedding and recurrent hidden size to 100, and apply one dropout layer with keeping probability equals to 0.3 between the embedding layer and the bidirectional recurrent layers. As shown in Figure 2, we see that setting the value of λ1 to 0.85 and λ2 to 0.80 produces the best performance for ﬁrst-pass-rl and two-pass-rl.