reproducibilityindex.ai

Gated Feedback Recurrent Neural Networks

Authors: Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs.
Researcher Affiliation	Academia	Junyoung Chung JUNYOUNG.CHUNG@UMONTREAL.CA Caglar Gulcehre CAGLAR.GULCEHRE@UMONTREAL.CA Kyunghyun Cho KYUNGHYUN.CHO@UMONTREAL.CA Yoshua Bengio FIND-ME@THE.WEB Dept. IRO, Universit e de Montr eal, CIFAR Senior Fellow
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not mention providing open-source code for the described methodology.
Open Datasets	Yes	We used the dataset made available as a part of the human knowledge compression contest (Hutter, 2012). We refer to this dataset as the Hutter dataset. The dataset, which was built from English Wikipedia, contains 100 MBytes of characters which include Latin alphabets, non-Latin alphabets, XML markups and special characters.
Dataset Splits	Yes	We used the first 90 MBytes of characters to train a model, the next 5 MBytes as a validation set, and the remaining as a test set, with the vocabulary of 205 characters including a token for an unknown character.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions Theano (Bastien et al., 2012) and Pylearn2 (Goodfellow et al., 2013) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	According to the preliminary experiments and their results on the validation set, we used a learning rate of 0.001 and momentum coefficient of 0.9 when training the models having either GRU or LSTM units. It was necessary to choose a much smaller learning rate of 5 10 5 in the case of tanh units to ensure the stability of learning. Whenever the norm of the gradient explodes, we halve the learning rate. Each update is done using a minibatch of 100 subsequences of length 100 each, to avoid memory overflow problems when unfolding in time for backprop.