Improved Neural Machine Translation with SMT Features
Authors: Wei He, Zhongjun He, Hua Wu, Haifeng Wang
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed method significantly improves the translation quality of the state-of-the-art NMT system on Chinese-to English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets. |
| Researcher Affiliation | Industry | Wei He, Zhongjun He, Hua Wu, and Haifeng Wang Baidu Inc. No. 10, Shangdi 10th Street, Beijing, 100085, China {hewei06, hezhongjun, wu hua, wanghaifeng}@baidu.com |
| Pseudocode | No | The paper includes equations and diagrams but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to the open-source Ground Hog system (https://github.com/lisa-groundhog/Ground Hog) which is used as a baseline, but does not provide code for the authors' own modifications or integrated system. |
| Open Datasets | No | The training corpora are automatically crawled from the web, containing about 2.2 billion Chinese words and 2.3 billion English words. |
| Dataset Splits | Yes | We used NIST MT06 as the development set and tested our system on NIST MT08. The evaluation metric is case-insensitive BLEU-4 (Papineni et al., 2002). |
| Hardware Specification | Yes | We ran both the training and decoding on a single machine with one GPU card (NVIDIA Tesla K10). |
| Software Dependencies | No | The paper mentions software tools like Ground Hog, GIZA++, and Moses, but does not specify their version numbers, which is required for reproducibility of ancillary software. |
| Experiment Setup | Yes | We set beam size to 10 for decoding. To train the Ground Hog system, we limit the vocabulary to 30K most frequent words for both the source and target languages. The encoder consists of a forward RNN and a backward RNN, and each has 1000 hidden units. The decoder has 1000 hidden units. The word embeddings are 620-dimensional. A mini-batch stochastic gradient descent (SGD) together with Adadelta (Zeiler 2012) are used to train the networks. Each mini-batch of SGD contains 50 sentence pairs. Adadelta is used to adapt the learning rate of parameters (ϵ = 10 6 and ρ = 0.95). The system is trained with about 1,570,000 updates for the RNN encoder. For the SMT system, we set the stack-limit to 200 and the translation-option-limit to 20. |