reproducibilityindex.ai

Joint Training for Neural Machine Translation Models with Monolingual Data

Authors: Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment results on Chinese-English and English-German translation tasks show that our approach can simultaneously improve translation quality of source-to-target and target-to-source models, signiﬁcantly outperforming strong baseline systems which are enhanced with monolingual data for model training including back-translation.
Researcher Affiliation	Collaboration	University of Science and Technology of China, Hefei, China Microsoft Research
Pseudocode	Yes	Algorithm 1 Joint Training Algorithm for NMT
Open Source Code	No	The paper does not provide any specific links or explicit statements about the open-source availability of the code for the described methodology.
Open Datasets	Yes	For Chinese English translation, we select our training data from LDC corpora2, which consists of 2.6M sentence pairs with 65.1M Chinese words and 67.1M English words respectively. We use 8M Chinese sentences and 8M English sentences randomly extracted from Xinhua portion of Gigaword corpus as the monolingual data sets. Any sentence longer than 60 words is removed from training data (both the bilingual data and pseudo bilingual data). For English German translation, we choose the WMT 14 training corpus used in Jean et al. (2015).
Dataset Splits	Yes	For Chinese-English, NIST Open MT 2006 evaluation set is used as validation set, and NIST 2003, NIST 2005, NIST 2008, NIST2012 datasets as test sets. The concatenation of news-test 2012 and news-test 2013 is used as the validation set and news-test 2014 as the test set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or specific cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions using 'RNNSearch model proposed by Bahdanau, Cho, and Bengio (2014)' and optimization with 'Adadelta (Zeiler 2012) algorithm', and 'Byte Pair Encoding (Sennrich, Haddow, and Birch 2016b)'. However, it does not specify version numbers for any software or libraries (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	The size of word embedding (for both source and target words) is 256 and the size of hidden layer is set to 1024. The parameters are initialized using a normal distribution with a mean of 0 and a variance of 6/(drow + dcol). Our models are optimized with the Adadelta (Zeiler 2012) algorithm with mini-batch size 128. We re-normalize gradient if its norm is larger than 2.0 (Pascanu, Mikolov, and Bengio 2013). At test time, beam search with size 8 is employed to ﬁnd the best translation, and translation probabilities are normalized by the length of the translation sentences. In practice, we ﬁrst sort all monolingual data according to the sentence length and then 64 sentences are simultaneously translated with parallel decoding implementation. As for model training, we ﬁnd that 4-5 EM iterations are enough to converge.