reproducibilityindex.ai

Sequence Level Training with Recurrent Neural Networks

Authors: Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our second contribution is a thorough empirical evaluation on three different tasks, namely, Text Summarization, Machine Translation and Image Captioning. We compare against several strong baselines... Our results show that MIXER with a simple greedy search achieves much better accuracy compared to the baselines on all the three tasks.
Researcher Affiliation	Industry	Facebook AI Research {ranzato, spchopra, michealauli, wojciech}@fb.com
Pseudocode	Yes	Algorithm 1: MIXER pseudo-code.
Open Source Code	Yes	Code available at: https://github.com/facebookresearch/MIXER
Open Datasets	Yes	The data set we use to train and evaluate our models consists of a subset of the Gigaword corpus (Graff et al., 2003)... We use data from the German English machine translation track of the IWSLT 2014 evaluation campaign (Cettolo et al., 2014)... For the image captioning task, we use the MSCOCO dataset (Lin et al., 2014).
Dataset Splits	Yes	The number of sample pairs in the training, validation and test set are 179414, 22568, and 22259 respectively. (for summarization) ... The training data comprises of about 153000 sentences... Our validation set comprises of 6969 sentence pairs... The test set is a concatenation of dev2010, dev2012, tst2010, tst2011 and tst2012 which results in 6750 sentence pairs. (for machine translation) ... We use the entire training set provided by the authors, which consists of around 80k images. We then took the original validation set (consisting of around 40k images) and randomly sampled (without replacement) 5000 images for validation and another 5000 for test. (for image captioning)
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments.
Software Dependencies	No	The paper mentions using the 'tokenizer of the Moses toolkit' and 'Convolutional Neural Network (CNN) trained on the Imagenet dataset', but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For training, we use stochastic gradient descent with mini-batches of size 32 and we reset the hidden states at the beginning of each sequence. Before updating the parameters we re-scale the gradients if their norm is above 10... We search over the values of hyper-parameter, such as the initial learning rate, the various scheduling parameters, number of epochs, etc., using a held-out validation set. Table 2: Best scheduling parameters found by hyper-parameter search of MIXER.