Syntax-Directed Attention for Neural Machine Translation
Authors: Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on the large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system. |
| Researcher Affiliation | Academia | 1Harbin Institute of Technology, Harbin, China 2National Institute of Information and Communications Technology, Kyoto, Japan {khchen, tjzhao}@hit.edu.cn, {wangrui, mutiyama, eiichiro.sumita}@nict.go.jp |
| Pseudocode | No | The paper presents mathematical equations and conceptual figures (Figure 1, 2, 3, 4) but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using the 'Nematus' toolkit: 'All NMT models were implemented in the NMT toolkit Nematus (Sennrich et al. 2017).3 We used the Stanford parser (Chang et al. 2009) to generate the dependency trees for source language sentences, such as Chinese sentences of ZH-EN and English sentences of EN-DE translation tasks.' and provides a link to Nematus. However, it does not state that the code for their specific method is open-source or provided. |
| Open Datasets | Yes | For English (EN) to German (DE) translation task, 4.43 million bilingual sentence pairs of the WMT 14 data set was used as the training data, including Common Crawl, News Commentary and Europarl v7. For Chinese (ZH) to English (EN) translation task, the training data set was 1.42 million bilingual sentence pairs from LDC corpora, which consisted of LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08, and LDC2005T06. |
| Dataset Splits | Yes | The newstest2012 and newstest2013/2014/2015 was used as dev set and test sets, respectively. The NIST02 and the NIST03/04/05/06/08 data sets were used as dev set and test sets, respectively. |
| Hardware Specification | Yes | Our NMT models were trained about 400k mini-batches using ADADELTA optimizer (Zeiler 2012), taking six days on a single Tesla P100 GPU, and the beam size for decoding was 12. |
| Software Dependencies | No | The paper mentions using the 'Nematus' toolkit and the 'Stanford parser' but does not specify their version numbers or the versions of any other software dependencies. |
| Experiment Setup | Yes | We limited the source and target vocabularies to 50K, and the maximum sentence length was 80. We shuffled training set before training and the mini-batch size is 80. The word embedding dimension was 620-dimensions and the hidden layer dimension was 1000-dimensions, and the default dropout technique (Hinton et al. 2012) in Nematus was used on the all layers. Our NMT models were trained about 400k mini-batches using ADADELTA optimizer (Zeiler 2012), taking six days on a single Tesla P100 GPU, and the beam size for decoding was 12. |