reproducibilityindex.ai

Recurrent Normalization Propagation

Authors: César Laurent, Nicolas Ballas, Pascal Vincent

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our proposal on character-level language modelling on the Penn Treebank corpus (Marcus et al., 1993) and on image generative modelling, applying our normalisation to the DRAW architecture (Gregor et al., 2015). We empirically show that it performs similarly or better than other recurrent normalization approaches, while being faster to execute.
Researcher Affiliation	Academia	C esar Laurent, Nicolas Ballas & Pascal Vincent Montreal Institute for Learning Algorithms (MILA) D epartement d Informatique et de Recherche Op erationnelle Universit e de Montr eal Montr eal, Qu ebec, Canada {firstname.lastname}@umontreal.ca Associate Fellow, Canadian Institute For Advanced Research (CIFAR)
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	We use J org Bornschein s implementation3, with the same hyper-parameters as Gregor et al. (2015), ie the read and write size are 2x2 and 5x5 respectively, the number of glimpses is 64, the LSTMs have 256 units and the dimension of z is 100. 3https://github.com/jbornschein/draw
Open Datasets	Yes	We empirically validate our proposal on character-level language modelling on the Penn Treebank corpus (Marcus et al., 1993) and on image generative modelling, applying our normalisation to the DRAW architecture (Gregor et al., 2015). The second task we explore is a generative modelling task on binarized MNIST (Larochelle & Murray, 2011) using the Deep Recurrent Attentive Writer (DRAW) (Gregor et al., 2015) architecture.
Dataset Splits	Yes	We use the same splits as Mikolov et al. (2012) and the same training procedure as Cooijmans et al. (2016), i.e. we train on sequences of length 100, with random starting point. Table 1: Perplexity (bits-per-character) on sequences of length 100 from the Penn Treebank validation set, and training time (seconds) per epoch.
Hardware Specification	Yes	2The GPU used is a NVIDIA GTX 750.
Software Dependencies	No	We used Theano (Theano Development Team, 2016), Blocks and Fuel (van Merri enboer et al., 2015) for our experiments.
Experiment Setup	Yes	To compare the convergence properties of Norm Prop against LN and BN, we ﬁrst ran experiments using Adam (Kingma & Ba, 2014) with learning rate 2e-3, exponential decay of 1e-3 and gradient clipping at 1.0. For Norm Prop, we use γx = γh = 2 and γc = 1, for LN all the γ = 1.0 and for BN all the γ = 0.1. We use Adam with learning rate of 1e-2, exponential decay of 1e-3 and mini-batch size of 128. For Norm Prop, we use γx = γh = γc = 0.5.