reproducibilityindex.ai

Sequence to Sequence Learning with Neural Networks

Authors: Ilya Sutskever, Oriol Vinyals, Quoc V Le

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM s BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difﬁculty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset.
Researcher Affiliation	Industry	Ilya Sutskever Google ilyasu@google.com Oriol Vinyals Google vinyals@google.com Quoc V. Le Google qvl@google.com
Pseudocode	No	The paper describes the model and processes via text and equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about the release of their source code, nor does it include a link to a code repository.
Open Datasets	Yes	We used the WMT 14 English to French dataset. We trained our models on a subset of 12M sentences consisting of 348M French words and 304M English words, which is a clean selected subset from [29]. We chose this translation task and this speciﬁc training set subset because of the public availability of a tokenized training and test set together with 1000-best lists from the baseline SMT [29].
Dataset Splits	No	The paper mentions 'training and test set' but does not specify a separate validation set with explicit sizes, percentages, or methodology for its split.
Hardware Specification	No	The paper mentions running experiments 'on a single GPU' and using 'an 8-GPU machine', but does not provide specific details such as GPU model numbers, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions a 'C++ implementation' but does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or compiler versions).
Experiment Setup	Yes	We used deep LSTMs with 4 layers, with 1000 cells at each layer and 1000 dimensional word embeddings, with an input vocabulary of 160,000 and an output vocabulary of 80,000. We initialized all of the LSTM s parameters with the uniform distribution between -0.08 and 0.08. We used stochastic gradient descent without momentum, with a ﬁxed learning rate of 0.7. After 5 epochs, we begun halving the learning rate every half epoch. We trained our models for a total of 7.5 epochs. We used batches of 128 sequences for the gradient and divided it the size of the batch (namely, 128).