Improved Neural Machine Translation with Source Syntax
Authors: Shuangzhi Wu, Ming Zhou, Dongdong Zhang
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on publicly available data sets with Chinese English and English-Japanese translation tasks. Experimental results on Chinese-English task show that our model significantly improves translation accuracy over the conventional NMT and SMT baseline systems. |
| Researcher Affiliation | Collaboration | Harbin Institute of Technology, Harbin, China Microsoft Research {v-shuawu, mingzhou, dozhang}@microsoft.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not explicitly state that source code for their method is made publicly available. |
| Open Datasets | Yes | We conduct experiments on the Chinese-English translation task as well as the English-Japanese translation task where the same data set from WAT 2016 ASPEC corpus [Nakazawa et al., 2016] 1 is used for a fair comparison with other work. In the Chinese-English translation task, the bilingual training data consists of a set of LDC datasets 2. |
| Dataset Splits | Yes | The development data set is NIST2003, and the testing data are NIST2005, NIST2006, NIST2008 and NIST2012 evaluation sets. The development data contains 1,790 sentences, and the test data contains 1,812 sentences with single reference per source sentence. Five groups of sentences are collected on the Japanese test set and the merged Chinese test set of NIST 2005, NIST 2006, NIST 2008 and NIST 2012, where source length ranges are {20-, 20-30, 30-40, 40-50, 50+}. The statistic of the five groups is shown in Table 3. |
| Hardware Specification | Yes | All model parameters are initialized randomly with Gaussian distribution and trained on a NVIDIA Tesla K40 GPU. |
| Software Dependencies | No | The paper mentions software like 'Ky Tea' and 'Adadelta algorithm' but does not specify their version numbers. |
| Experiment Setup | Yes | The size of word embeddings is set to 512 for both tasks. The dimensions of hidden states for all RNNs are set to 1024. The stochastic gradient descent (SGD) algorithm is used to tune parameters with a learning rate of 1.0 and a batch size of 128. We use the beam search strategy for decoding with a beam size of 12. |