reproducibilityindex.ai

Improved Neural Machine Translation with SMT Features

Authors: Wei He, Zhongjun He, Hua Wu, Haifeng Wang

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the proposed method signiﬁcantly improves the translation quality of the state-of-the-art NMT system on Chinese-to English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.
Researcher Affiliation	Industry	Wei He, Zhongjun He, Hua Wu, and Haifeng Wang Baidu Inc. No. 10, Shangdi 10th Street, Beijing, 100085, China {hewei06, hezhongjun, wu hua, wanghaifeng}@baidu.com
Pseudocode	No	The paper includes equations and diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper provides a link to the open-source Ground Hog system (https://github.com/lisa-groundhog/Ground Hog) which is used as a baseline, but does not provide code for the authors' own modifications or integrated system.
Open Datasets	No	The training corpora are automatically crawled from the web, containing about 2.2 billion Chinese words and 2.3 billion English words.
Dataset Splits	Yes	We used NIST MT06 as the development set and tested our system on NIST MT08. The evaluation metric is case-insensitive BLEU-4 (Papineni et al., 2002).
Hardware Specification	Yes	We ran both the training and decoding on a single machine with one GPU card (NVIDIA Tesla K10).
Software Dependencies	No	The paper mentions software tools like Ground Hog, GIZA++, and Moses, but does not specify their version numbers, which is required for reproducibility of ancillary software.
Experiment Setup	Yes	We set beam size to 10 for decoding. To train the Ground Hog system, we limit the vocabulary to 30K most frequent words for both the source and target languages. The encoder consists of a forward RNN and a backward RNN, and each has 1000 hidden units. The decoder has 1000 hidden units. The word embeddings are 620-dimensional. A mini-batch stochastic gradient descent (SGD) together with Adadelta (Zeiler 2012) are used to train the networks. Each mini-batch of SGD contains 50 sentence pairs. Adadelta is used to adapt the learning rate of parameters (ϵ = 10 6 and ρ = 0.95). The system is trained with about 1,570,000 updates for the RNN encoder. For the SMT system, we set the stack-limit to 200 and the translation-option-limit to 20.