reproducibilityindex.ai

Attention-via-Attention Neural Machine Translation

Authors: Shenjian Zhao, Zhihua Zhang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness and the efﬁciency of the proposed attention-via-attention model on the WMT 15 En-Fr and En-Cs translation tasks. We conduct comparison with various strong baselines including RNNsearch (Bahdanau, Cho, and Bengio 2015), GNMT (Wu et al. 2016), bpe2char models (Chung, Cho, and Bengio 2016), char2char models (Lee, Cho, and Hofmann 2016) and hybrid models (Luong and Manning 2016). For fair comparison, two metrics are used: BLEU (Papineni et al. 2002) and chr F3 (Popovic 2015).
Researcher Affiliation	Academia	Shenjian Zhao Department of Computer Science and Engineering Shanghai Jiao Tong University sword.york@gmail.com Zhihua Zhang Peking University Beijing Institute of Big Data Research zhzhang@math.pku.edu.cn
Pseudocode	No	The paper describes the architecture and mathematical formulations of the proposed model but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	No	The paper mentions: 'We use the scripts from Moses to compute the BLEU score. For chr F3, we use the implementation from github: https://github.com/rsennrich/subword-nmt.' These links are to third-party tools used for evaluation, not the authors' own source code for their proposed model or methodology. There is no statement indicating the release of their own implementation code.
Open Datasets	Yes	We use the parallel corpora from WMT. When comparing with RNNsearch on En-Fr task, we reduce the size of the combined corpus to have 12.1M sentence pairs for fairness. When comparing with GNMT, we use the whole dataset which contains 36M parallel sentences. For En-Cs, we use all parallel corpora available for WMT 15. The URL http://www.statmt.org/wmt15/translation -task.html is also provided.
Dataset Splits	Yes	We use newstest2013 as the development set and evaluate the models on newstest2014 and newstest2015 for En-Fr and En-Cs task, respectively.
Hardware Specification	Yes	We train each shallow model for approximately 2 weeks on a single Titan X GPU.
Software Dependencies	No	The paper mentions using the ADAM optimizer, Moses scripts, and subword-nmt implementation, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We use the ADAM optimizer (Kingma and Ba 2015) with minibatch of 100 sentences to train each model. The learning rate is ﬁrst set to 5e 4 and then halved every epoch. The norm of the gradient is clipped with a threshold of 1. The beam width is set to 12 for all models.