reproducibilityindex.ai

Neural Machine Translation by Jointly Learning to Align and Translate

Authors: Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed approach on the task of English-to-French translation. We use the bilingual, parallel corpora provided by ACL WMT 14. As a comparison, we also report the performance of an RNN Encoder Decoder which was proposed recently by Cho et al. (2014a).
Researcher Affiliation	Academia	Dzmitry Bahdanau Jacobs University Bremen, Germany Kyung Hyun Cho Yoshua Bengio Universit e de Montr eal
Pseudocode	No	The paper describes the model architecture and training procedure using mathematical equations and descriptive text, but no explicit 'Pseudocode' or 'Algorithm' block is provided.
Open Source Code	Yes	Implementations are available at https://github.com/lisa-groundhog/Ground Hog.
Open Datasets	Yes	We use the bilingual, parallel corpora provided by ACL WMT 14.3 http://www.statmt.org/wmt14/translation-task.html
Dataset Splits	Yes	We concatenate news-test-2012 and news-test-2013 to make a development (validation) set, and evaluate the models on the test set (news-test-2014) from WMT 14, which consists of 3003 sentences not present in the training data.
Hardware Specification	Yes	TITAN BLACK, Quadro K-6000 (from Table 2)
Software Dependencies	No	The paper mentions software like Theano, Adadelta, and Moses for tokenization, but specific version numbers for these dependencies are not provided.
Experiment Setup	Yes	The encoder and decoder of the RNNencdec have 1000 hidden units each. ... We use a minibatch stochastic gradient descent (SGD) algorithm together with Adadelta (Zeiler, 2012) to train each model. Each SGD update direction is computed using a minibatch of 80 sentences. (Further details in Appendix A.2.3 and B.2 regarding hidden units (1000), embedding dimensionality (620), maxout layer size (500), Adadelta parameters (ϵ = 10^-6 and ρ = 0.95), and gradient normalization threshold (1)).