Neural Machine Translation Advised by Statistical Machine Translation
Authors: Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets. |
| Researcher Affiliation | Collaboration | Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang Soochow University, Suzhou xingwsuda@gmail.com, {dyxiong, minzhang}@suda.edu.cn Noah s Ark Lab, Huawei Technologies, Hong Kong {lu.zhengdong, tu.zhaopeng, hangli.hl}@huawei.com |
| Pseudocode | No | The paper includes diagrams and descriptive text of the model, but no formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the code for the methodology described in this paper is open source or provide a direct link to a code repository. |
| Open Datasets | Yes | The training set is a parallel corpus from LDC, containing 1.25M sentence pairs with 27.9M Chinese words and 34.5M English words. We use NIST 2006 dataset as development set, and NIST 2002, 2003, 2004, 2005 and 2008 datasets as test sets. |
| Dataset Splits | Yes | We use NIST 2006 dataset as development set, and NIST 2002, 2003, 2004, 2005 and 2008 datasets as test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Ken LM', 'Ground Hog', 'Moses', and 'Adadelta', but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We limit the source and target vocabularies to the most frequent 30K words in Chinese and English, covering approximately 97.7% and 99.3% of the data in the two languages respectively. All other words are mapped to a special token UNK. We train the model with the sentences of length up to 50 words in training data and keep the test data at the original length. The word embedding dimension of both sides is 620 and the size of hidden layer is 1000. All the other settings are the same as in (Bahdanau, Cho, and Bengio 2015). We also use our implementation of RNNSearch which adopts feedback attention and dropout as NMT baseline system. Dropout is applied only on the output layer and the dropout rate is set to 0.5. We use a simple left-to-right beam search decoder with beam size 10 to find the most likely translation. ... We use a minibatch stochastic gradient descent (SGD) algorithm together with Adadelta (Zeiler 2012) to train the NMT models and the decay rates ρ and ϵ are set as 0.95 and 10 6. Each SGD update direction is computed using a minibatch of 80 sentences. ... To ensure the quality of SMT recommendations, we set Ntm to 5 and Nrec to 25. We adopt a forward neural network with two hidden layers for the SMT classifier (Equation (11)) and gating function (Equation (12)). The numbers of units in the hidden layers are 2000 and 500 respectively. |