reproducibilityindex.ai

Multi-task Sequence to Sequence Learning

Authors: Minh-Thang Luong, Quoc Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Furthermore, we have established a new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought.
Researcher Affiliation	Collaboration	Minh-Thang Luong , Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser Google Brain lmthang@stanford.edu,{qvl,ilyasu,vinyals,lukaszkaiser}@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, nor does it mention a specific repository link or explicit code release statement.
Open Datasets	Yes	We use the WMT 15 data (Bojar et al., 2015) for the English German translation problem. ... Penn Tree Bank (PTB) dataset (Marcus et al., 1993) and, ...the high-conﬁdence (HC) parse trees provided by Vinyals et al. (2015a). Lastly, for image caption generation, we use a dataset of image and caption pairs provided by Vinyals et al. (2015b).
Dataset Splits	Yes	We use newstest2013 (3000 sentences) as a validation set to select our hyperparameters... For testing, to be comparable with existing results in (Luong et al., 2015a), we use the ﬁltered newstest2014 (2737 sentences) for the English German translation task and newstest2015 (2169 sentences) for the German English task. ... The two parsing tasks, however, are evaluated on the same validation (section 22) and test (section 23) sets from the PTB data.
Hardware Specification	No	The paper describes model architecture and training parameters (e.g., '4 LSTM layers each of which has 1000-dimensional cells and embeddings', 'mini-batch size of 128') but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions 'Moses' for tokenization and 'SGD' for optimization, but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	In all experiments, following Sutskever et al. (2014) and Luong et al. (2015b), we train deep LSTM models as follows: (a) we use 4 LSTM layers each of which has 1000-dimensional cells and embeddings, (b) parameters are uniformly initialized in [-0.06, 0.06], (c) we use a mini-batch size of 128, (d) dropout is applied with probability of 0.2 over vertical connections (Pham et al., 2014), (e) we use SGD with a ﬁxed learning rate of 0.7, (f) input sequences are reversed, and lastly, (g) we use a simple ﬁnetuning schedule after x epochs, we halve the learning rate every y epochs. The values x and y are referred as ﬁnetune start and ﬁnetune cycle in Table 1 together with the number of training epochs per task.