Improved Neural Machine Translation with SMT Features

Authors: Wei He, Zhongjun He, Hua Wu, Haifeng Wang

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the proposed method significantly improves the translation quality of the state-of-the-art NMT system on Chinese-to English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.
Researcher Affiliation Industry Wei He, Zhongjun He, Hua Wu, and Haifeng Wang Baidu Inc. No. 10, Shangdi 10th Street, Beijing, 100085, China {hewei06, hezhongjun, wu hua, wanghaifeng}@baidu.com
Pseudocode No The paper includes equations and diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to the open-source Ground Hog system (https://github.com/lisa-groundhog/Ground Hog) which is used as a baseline, but does not provide code for the authors' own modifications or integrated system.
Open Datasets No The training corpora are automatically crawled from the web, containing about 2.2 billion Chinese words and 2.3 billion English words.
Dataset Splits Yes We used NIST MT06 as the development set and tested our system on NIST MT08. The evaluation metric is case-insensitive BLEU-4 (Papineni et al., 2002).
Hardware Specification Yes We ran both the training and decoding on a single machine with one GPU card (NVIDIA Tesla K10).
Software Dependencies No The paper mentions software tools like Ground Hog, GIZA++, and Moses, but does not specify their version numbers, which is required for reproducibility of ancillary software.
Experiment Setup Yes We set beam size to 10 for decoding. To train the Ground Hog system, we limit the vocabulary to 30K most frequent words for both the source and target languages. The encoder consists of a forward RNN and a backward RNN, and each has 1000 hidden units. The decoder has 1000 hidden units. The word embeddings are 620-dimensional. A mini-batch stochastic gradient descent (SGD) together with Adadelta (Zeiler 2012) are used to train the networks. Each mini-batch of SGD contains 50 sentence pairs. Adadelta is used to adapt the learning rate of parameters (ϵ = 10 6 and ρ = 0.95). The system is trained with about 1,570,000 updates for the RNN encoder. For the SMT system, we set the stack-limit to 200 and the translation-option-limit to 20.