reproducibilityindex.ai

Syntax-Directed Attention for Neural Machine Translation

Authors: Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments on the large-scale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves a substantial and signiﬁcant improvement over the baseline system.
Researcher Affiliation	Academia	1Harbin Institute of Technology, Harbin, China 2National Institute of Information and Communications Technology, Kyoto, Japan {khchen, tjzhao}@hit.edu.cn, {wangrui, mutiyama, eiichiro.sumita}@nict.go.jp
Pseudocode	No	The paper presents mathematical equations and conceptual figures (Figure 1, 2, 3, 4) but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the 'Nematus' toolkit: 'All NMT models were implemented in the NMT toolkit Nematus (Sennrich et al. 2017).3 We used the Stanford parser (Chang et al. 2009) to generate the dependency trees for source language sentences, such as Chinese sentences of ZH-EN and English sentences of EN-DE translation tasks.' and provides a link to Nematus. However, it does not state that the code for their specific method is open-source or provided.
Open Datasets	Yes	For English (EN) to German (DE) translation task, 4.43 million bilingual sentence pairs of the WMT 14 data set was used as the training data, including Common Crawl, News Commentary and Europarl v7. For Chinese (ZH) to English (EN) translation task, the training data set was 1.42 million bilingual sentence pairs from LDC corpora, which consisted of LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08, and LDC2005T06.
Dataset Splits	Yes	The newstest2012 and newstest2013/2014/2015 was used as dev set and test sets, respectively. The NIST02 and the NIST03/04/05/06/08 data sets were used as dev set and test sets, respectively.
Hardware Specification	Yes	Our NMT models were trained about 400k mini-batches using ADADELTA optimizer (Zeiler 2012), taking six days on a single Tesla P100 GPU, and the beam size for decoding was 12.
Software Dependencies	No	The paper mentions using the 'Nematus' toolkit and the 'Stanford parser' but does not specify their version numbers or the versions of any other software dependencies.
Experiment Setup	Yes	We limited the source and target vocabularies to 50K, and the maximum sentence length was 80. We shufﬂed training set before training and the mini-batch size is 80. The word embedding dimension was 620-dimensions and the hidden layer dimension was 1000-dimensions, and the default dropout technique (Hinton et al. 2012) in Nematus was used on the all layers. Our NMT models were trained about 400k mini-batches using ADADELTA optimizer (Zeiler 2012), taking six days on a single Tesla P100 GPU, and the beam size for decoding was 12.