reproducibilityindex.ai

Hierarchical Multiscale Recurrent Neural Networks

Authors: Junyoung Chung, Sungjin Ahn, Yoshua Bengio

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on two tasks: character-level language modelling and handwriting sequence generation. For the character-level language modelling, the HM-RNN achieves the state-of-the-art results on the Text8 dataset, and comparable results to the state-of-the-art on the Penn Treebank and Hutter Prize Wikipedia datasets. The HM-RNN also outperforms the standard RNN on the handwriting sequence generation using the IAM-On DB dataset.
Researcher Affiliation	Academia	Junyoung Chung, Sungjin Ahn & Yoshua Bengio Département d informatique et de recherche opérationnelle Université de Montréal {junyoung.chung,sungjin.ahn,yoshua.bengio}@umontreal.ca
Pseudocode	No	The paper describes the operations (UPDATE, COPY, FLUSH) and their mathematical formulations (Eq. 1-7) but does not include a block explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the proposed HM-RNN model is openly available.
Open Datasets	Yes	We evaluate our model on three benchmark text corpora: (1) Penn Treebank, (2) Text8 and (3) Hutter Prize Wikipedia. ... We process the Penn Treebank dataset (Marcus et al., 1993)... The Text8 dataset (Mahoney, 2009)... The Hutter Prize Wikipedia (enwik8) dataset (Hutter, 2012)... We extend the evaluation of the HM-LSTM to a real-valued sequence modelling task using IAM-On DB (Liwicki & Bunke, 2005) dataset.
Dataset Splits	Yes	We use 10, 465 sequences for training, 581 for validation, 582 for test. The average length of the sequences is 648. (IAM-On DB) AND We follow the data splits used in Graves (2013) where the ﬁrst 90M characters are used to train the model, the next 5M characters for validation, and the remainders for the test set. (Hutter Prize Wikipedia)
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to conduct the experiments.
Software Dependencies	No	The authors thank the developers of Theano (Team et al., 2016). However, a specific version number for Theano or any other critical software dependency is not provided.
Experiment Setup	Yes	Each update is done by using a mini-batch of 64 examples of length 100... We train the model using Adam (Kingma & Ba, 2014) with an initial learning rate of 0.002. We divide the learning rate by a factor of 50 when the validation negative log-likelihood stopped decreasing. The norm of the gradient is clipped with a threshold of 1... We increased the slope a with the following schedule a = min (5, 1 + 0.04 Nepoch)... For the Penn Treebank dataset, we use 512 units in each layer of the HM-LSTM and for the output embedding layer.