reproducibilityindex.ai

Asynchronous Bidirectional Decoding for Neural Machine Translation

Authors: Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, Hongji Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. We evaluated the proposed model on NIST Chinese-English and WMT English-German translation tasks.
Researcher Affiliation	Academia	Xiamen University, Xiamen, China1 Tsinghua University, Beijing, China2
Pseudocode	No	The paper describes the model using mathematical equations and textual explanations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.
Open Datasets	Yes	For Chinese-English translation, the training data consists of 1.25M bilingual sentences with 27.9M Chinese words and 34.5M English words. These sentence pairs are mainly extracted from LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06. For English-German translation, we used WMT 2015 training data that contains 4.46M sentence pairs with 116.1M English words and 108.9M German words.
Dataset Splits	Yes	We chose NIST 2002 (MT02) dataset as our development set, and the NIST 2003 (MT03), 2004 (MT04), 2005 (MT05), and 2006 (MT06) datasets as our test sets. The news-test 2013 was used as development set and the news-test 2015 as test set.
Hardware Specification	Yes	We used a single GPU device 1080Ti to train models.
Software Dependencies	No	The paper mentions using Gated Recurrent Unit (GRU), Rmsprop, and byte pair encoding (BPE) but does not provide specific version numbers for any software libraries or frameworks (e.g., TensorFlow, PyTorch, Caffe) used for implementation.
Experiment Setup	Yes	During this procedure, we set the following hyper-parameters: word embedding dimension as 620, hidden layer size as 1000, learning rate as 5 10 4, batch size as 80, gradient norm as 1.0, and dropout rate as 0.3. We set beam sizes of all above-mentioned models as 10, and the beam sizes of the backward and forward decoders of our model as 1 and 10, respectively.