Recurrent Normalization Propagation
Authors: César Laurent, Nicolas Ballas, Pascal Vincent
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our proposal on character-level language modelling on the Penn Treebank corpus (Marcus et al., 1993) and on image generative modelling, applying our normalisation to the DRAW architecture (Gregor et al., 2015). We empirically show that it performs similarly or better than other recurrent normalization approaches, while being faster to execute. |
| Researcher Affiliation | Academia | C esar Laurent, Nicolas Ballas & Pascal Vincent Montreal Institute for Learning Algorithms (MILA) D epartement d Informatique et de Recherche Op erationnelle Universit e de Montr eal Montr eal, Qu ebec, Canada {firstname.lastname}@umontreal.ca Associate Fellow, Canadian Institute For Advanced Research (CIFAR) |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | No | We use J org Bornschein s implementation3, with the same hyper-parameters as Gregor et al. (2015), ie the read and write size are 2x2 and 5x5 respectively, the number of glimpses is 64, the LSTMs have 256 units and the dimension of z is 100. 3https://github.com/jbornschein/draw |
| Open Datasets | Yes | We empirically validate our proposal on character-level language modelling on the Penn Treebank corpus (Marcus et al., 1993) and on image generative modelling, applying our normalisation to the DRAW architecture (Gregor et al., 2015). The second task we explore is a generative modelling task on binarized MNIST (Larochelle & Murray, 2011) using the Deep Recurrent Attentive Writer (DRAW) (Gregor et al., 2015) architecture. |
| Dataset Splits | Yes | We use the same splits as Mikolov et al. (2012) and the same training procedure as Cooijmans et al. (2016), i.e. we train on sequences of length 100, with random starting point. Table 1: Perplexity (bits-per-character) on sequences of length 100 from the Penn Treebank validation set, and training time (seconds) per epoch. |
| Hardware Specification | Yes | 2The GPU used is a NVIDIA GTX 750. |
| Software Dependencies | No | We used Theano (Theano Development Team, 2016), Blocks and Fuel (van Merri enboer et al., 2015) for our experiments. |
| Experiment Setup | Yes | To compare the convergence properties of Norm Prop against LN and BN, we first ran experiments using Adam (Kingma & Ba, 2014) with learning rate 2e-3, exponential decay of 1e-3 and gradient clipping at 1.0. For Norm Prop, we use γx = γh = 2 and γc = 1, for LN all the γ = 1.0 and for BN all the γ = 0.1. We use Adam with learning rate of 1e-2, exponential decay of 1e-3 and mini-batch size of 128. For Norm Prop, we use γx = γh = γc = 0.5. |