reproducibilityindex.ai

Recurrent Batch Normalization

Authors: Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposal on various sequential problems such as sequence classiﬁcation, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
Researcher Affiliation	Academia	Tim Cooijmans, Nicolas Ballas, César Laurent, Ça glar Gülçehre & Aaron Courville MILA Université de Montréal firstname.lastname@umontreal.ca
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the release of their source code. It mentions using Theano, Blocks, and Fuel libraries, but these are third-party tools.
Open Datasets	Yes	We evaluate our batch-normalized LSTM on a sequential version of the MNIST classiﬁcation task (Le et al., 2015). ... We evaluate our model on the task of character-level language modeling on the Penn Treebank corpus (Marcus et al., 1993) according to the train/valid/test partition of Mikolov et al. (2012). ... We evaluate our model on a second character-level language modeling task on the much larger text8 dataset (Mahoney, 2009). ... We evaluate the models on the question answering task using the CNN corpus (Hermann et al., 2015).
Dataset Splits	Yes	We evaluate our model on the task of character-level language modeling on the Penn Treebank corpus (Marcus et al., 1993) according to the train/valid/test partition of Mikolov et al. (2012). ... we use the ﬁrst 90M characters for training, the next 5M for validation and the ﬁnal 5M characters for testing.
Hardware Specification	No	The paper mentions general computing support from "Calcul Québec, Compute Canada" in the acknowledgements, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies	No	The paper mentions using "Theano (Team et al., 2016) and the Blocks and Fuel (van Merriënboer et al., 2015) libraries for scientiﬁc computing" but does not specify version numbers for these software components.
Experiment Setup	Yes	Note that for all the experiments, we initialize the batch normalization scale and shift parameters γ and β to 0.1 and 0 respectively. ... The model is trained using RMSProp (Tieleman & Hinton, 2012) with learning rate of 10 3 and 0.9 momentum. We apply gradient clipping at 1 to avoid exploding gradients. ... For the reported performances, the ﬁrst three models (LSTM, BN-LSTM and BN-everywhere) are trained using the exact same hyperparameters... We use stochastic gradient descent on minibatches of size 64, with gradient clipping at 10 and step rule determined by Adam (Kingma & Ba, 2014) with learning rate 8 10 5. ... Appendix D provides tables with hyperparameter values tried for different tasks, including learning rate, RMSProp momentum, hidden state size, initial γ, and batch size.