Regularizing RNNs by Stabilizing Activations

Authors: David Krueger, Roland Memisevic

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 2 EXPERIMENTS Table 1: LSTM Performance (bits-per-character) on Penn Treebank for different values of β. Table 4: Phoneme Error Rate (PER) on TIMIT for different experiment settings
Researcher Affiliation Academia David Krueger & Roland Memisevic Department of Computer Science and Operations Research University of Montreal Montreal, QC H3T 1J4, Canada {david.krueger@umontreal.ca, memisevr@iro.umontreal.ca}
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide concrete access to source code for the described methodology.
Open Datasets Yes character-level language modeling on Penn Treebank (Marcus et al., 1993), phoneme recognition on the TIMIT dataset, The adding task (Hochreiter & Schmidhuber, 1997)
Dataset Splits Yes We early stop after 25 epochs without improvement on the development set.
Hardware Specification Yes We appreciate the many k80 GPUs provided by Compute Canada.
Software Dependencies No The paper mentions 'Theano (Bastien et al., 2012) and Blocks (van Merri enboer et al., 2015)' but does not provide specific version numbers for these software dependencies as used in their experiments.
Experiment Setup Yes Unless otherwise specified, we use 1000/1600 units for LSTM/SRNN, and SGD with learning rate=.002, momentum=.99, and gradient clipping=1. We train for a maximum of 1000 epochs and use sequences of length 50 taken without overlap. We train with Adam (Kingma & Ba, 2014) using learning rate=.001 and gradient clipping=200. We early stop after 25 epochs without improvement on the development set.