Sequence Level Training with Recurrent Neural Networks
Authors: Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our second contribution is a thorough empirical evaluation on three different tasks, namely, Text Summarization, Machine Translation and Image Captioning. We compare against several strong baselines... Our results show that MIXER with a simple greedy search achieves much better accuracy compared to the baselines on all the three tasks. |
| Researcher Affiliation | Industry | Facebook AI Research {ranzato, spchopra, michealauli, wojciech}@fb.com |
| Pseudocode | Yes | Algorithm 1: MIXER pseudo-code. |
| Open Source Code | Yes | Code available at: https://github.com/facebookresearch/MIXER |
| Open Datasets | Yes | The data set we use to train and evaluate our models consists of a subset of the Gigaword corpus (Graff et al., 2003)... We use data from the German English machine translation track of the IWSLT 2014 evaluation campaign (Cettolo et al., 2014)... For the image captioning task, we use the MSCOCO dataset (Lin et al., 2014). |
| Dataset Splits | Yes | The number of sample pairs in the training, validation and test set are 179414, 22568, and 22259 respectively. (for summarization) ... The training data comprises of about 153000 sentences... Our validation set comprises of 6969 sentence pairs... The test set is a concatenation of dev2010, dev2012, tst2010, tst2011 and tst2012 which results in 6750 sentence pairs. (for machine translation) ... We use the entire training set provided by the authors, which consists of around 80k images. We then took the original validation set (consisting of around 40k images) and randomly sampled (without replacement) 5000 images for validation and another 5000 for test. (for image captioning) |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions using the 'tokenizer of the Moses toolkit' and 'Convolutional Neural Network (CNN) trained on the Imagenet dataset', but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For training, we use stochastic gradient descent with mini-batches of size 32 and we reset the hidden states at the beginning of each sequence. Before updating the parameters we re-scale the gradients if their norm is above 10... We search over the values of hyper-parameter, such as the initial learning rate, the various scheduling parameters, number of epochs, etc., using a held-out validation set. Table 2: Best scheduling parameters found by hyper-parameter search of MIXER. |