Multi-task Sequence to Sequence Learning
Authors: Minh-Thang Luong, Quoc Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Furthermore, we have established a new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought. |
| Researcher Affiliation | Collaboration | Minh-Thang Luong , Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser Google Brain lmthang@stanford.edu,{qvl,ilyasu,vinyals,lukaszkaiser}@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, nor does it mention a specific repository link or explicit code release statement. |
| Open Datasets | Yes | We use the WMT 15 data (Bojar et al., 2015) for the English German translation problem. ... Penn Tree Bank (PTB) dataset (Marcus et al., 1993) and, ...the high-confidence (HC) parse trees provided by Vinyals et al. (2015a). Lastly, for image caption generation, we use a dataset of image and caption pairs provided by Vinyals et al. (2015b). |
| Dataset Splits | Yes | We use newstest2013 (3000 sentences) as a validation set to select our hyperparameters... For testing, to be comparable with existing results in (Luong et al., 2015a), we use the filtered newstest2014 (2737 sentences) for the English German translation task and newstest2015 (2169 sentences) for the German English task. ... The two parsing tasks, however, are evaluated on the same validation (section 22) and test (section 23) sets from the PTB data. |
| Hardware Specification | No | The paper describes model architecture and training parameters (e.g., '4 LSTM layers each of which has 1000-dimensional cells and embeddings', 'mini-batch size of 128') but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions 'Moses' for tokenization and 'SGD' for optimization, but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In all experiments, following Sutskever et al. (2014) and Luong et al. (2015b), we train deep LSTM models as follows: (a) we use 4 LSTM layers each of which has 1000-dimensional cells and embeddings, (b) parameters are uniformly initialized in [-0.06, 0.06], (c) we use a mini-batch size of 128, (d) dropout is applied with probability of 0.2 over vertical connections (Pham et al., 2014), (e) we use SGD with a fixed learning rate of 0.7, (f) input sequences are reversed, and lastly, (g) we use a simple finetuning schedule after x epochs, we halve the learning rate every y epochs. The values x and y are referred as finetune start and finetune cycle in Table 1 together with the number of training epochs per task. |