Regularizing RNNs by Stabilizing Activations
Authors: David Krueger, Roland Memisevic
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 2 EXPERIMENTS Table 1: LSTM Performance (bits-per-character) on Penn Treebank for different values of β. Table 4: Phoneme Error Rate (PER) on TIMIT for different experiment settings |
| Researcher Affiliation | Academia | David Krueger & Roland Memisevic Department of Computer Science and Operations Research University of Montreal Montreal, QC H3T 1J4, Canada {david.krueger@umontreal.ca, memisevr@iro.umontreal.ca} |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide concrete access to source code for the described methodology. |
| Open Datasets | Yes | character-level language modeling on Penn Treebank (Marcus et al., 1993), phoneme recognition on the TIMIT dataset, The adding task (Hochreiter & Schmidhuber, 1997) |
| Dataset Splits | Yes | We early stop after 25 epochs without improvement on the development set. |
| Hardware Specification | Yes | We appreciate the many k80 GPUs provided by Compute Canada. |
| Software Dependencies | No | The paper mentions 'Theano (Bastien et al., 2012) and Blocks (van Merri enboer et al., 2015)' but does not provide specific version numbers for these software dependencies as used in their experiments. |
| Experiment Setup | Yes | Unless otherwise specified, we use 1000/1600 units for LSTM/SRNN, and SGD with learning rate=.002, momentum=.99, and gradient clipping=1. We train for a maximum of 1000 epochs and use sequences of length 50 taken without overlap. We train with Adam (Kingma & Ba, 2014) using learning rate=.001 and gradient clipping=200. We early stop after 25 epochs without improvement on the development set. |