Hierarchical Multiscale Recurrent Neural Networks
Authors: Junyoung Chung, Sungjin Ahn, Yoshua Bengio
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on two tasks: character-level language modelling and handwriting sequence generation. For the character-level language modelling, the HM-RNN achieves the state-of-the-art results on the Text8 dataset, and comparable results to the state-of-the-art on the Penn Treebank and Hutter Prize Wikipedia datasets. The HM-RNN also outperforms the standard RNN on the handwriting sequence generation using the IAM-On DB dataset. |
| Researcher Affiliation | Academia | Junyoung Chung, Sungjin Ahn & Yoshua Bengio Département d informatique et de recherche opérationnelle Université de Montréal {junyoung.chung,sungjin.ahn,yoshua.bengio}@umontreal.ca |
| Pseudocode | No | The paper describes the operations (UPDATE, COPY, FLUSH) and their mathematical formulations (Eq. 1-7) but does not include a block explicitly labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the proposed HM-RNN model is openly available. |
| Open Datasets | Yes | We evaluate our model on three benchmark text corpora: (1) Penn Treebank, (2) Text8 and (3) Hutter Prize Wikipedia. ... We process the Penn Treebank dataset (Marcus et al., 1993)... The Text8 dataset (Mahoney, 2009)... The Hutter Prize Wikipedia (enwik8) dataset (Hutter, 2012)... We extend the evaluation of the HM-LSTM to a real-valued sequence modelling task using IAM-On DB (Liwicki & Bunke, 2005) dataset. |
| Dataset Splits | Yes | We use 10, 465 sequences for training, 581 for validation, 582 for test. The average length of the sequences is 648. (IAM-On DB) AND We follow the data splits used in Graves (2013) where the first 90M characters are used to train the model, the next 5M characters for validation, and the remainders for the test set. (Hutter Prize Wikipedia) |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to conduct the experiments. |
| Software Dependencies | No | The authors thank the developers of Theano (Team et al., 2016). However, a specific version number for Theano or any other critical software dependency is not provided. |
| Experiment Setup | Yes | Each update is done by using a mini-batch of 64 examples of length 100... We train the model using Adam (Kingma & Ba, 2014) with an initial learning rate of 0.002. We divide the learning rate by a factor of 50 when the validation negative log-likelihood stopped decreasing. The norm of the gradient is clipped with a threshold of 1... We increased the slope a with the following schedule a = min (5, 1 + 0.04 Nepoch)... For the Penn Treebank dataset, we use 512 units in each layer of the HM-LSTM and for the output embedding layer. |