Gated Feedback Recurrent Neural Networks
Authors: Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. |
| Researcher Affiliation | Academia | Junyoung Chung JUNYOUNG.CHUNG@UMONTREAL.CA Caglar Gulcehre CAGLAR.GULCEHRE@UMONTREAL.CA Kyunghyun Cho KYUNGHYUN.CHO@UMONTREAL.CA Yoshua Bengio FIND-ME@THE.WEB Dept. IRO, Universit e de Montr eal, CIFAR Senior Fellow |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | Yes | We used the dataset made available as a part of the human knowledge compression contest (Hutter, 2012). We refer to this dataset as the Hutter dataset. The dataset, which was built from English Wikipedia, contains 100 MBytes of characters which include Latin alphabets, non-Latin alphabets, XML markups and special characters. |
| Dataset Splits | Yes | We used the first 90 MBytes of characters to train a model, the next 5 MBytes as a validation set, and the remaining as a test set, with the vocabulary of 205 characters including a token for an unknown character. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions Theano (Bastien et al., 2012) and Pylearn2 (Goodfellow et al., 2013) but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | According to the preliminary experiments and their results on the validation set, we used a learning rate of 0.001 and momentum coefficient of 0.9 when training the models having either GRU or LSTM units. It was necessary to choose a much smaller learning rate of 5 10 5 in the case of tanh units to ensure the stability of learning. Whenever the norm of the gradient explodes, we halve the learning rate. Each update is done using a minibatch of 100 subsequences of length 100 each, to avoid memory overflow problems when unfolding in time for backprop. |