reproducibilityindex.ai

Learning Hierarchical Information Flow with Recurrent Neural Modules

Authors: Danijar Hafner, Alexander Irpan, James Davidson, Nicolas Heess

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our model learns to route information hierarchically, processing input data by a chain of modules. We observe common architectures, such as feed forward neural networks and skip connections, emerging as special cases of our architecture, while novel connectivity patterns are learned for the text8 compression task. Our model outperforms standard recurrent neural networks on several sequential benchmarks.
Researcher Affiliation	Industry	Danijar Hafner Google Brain mail@danijar.com Alex Irpan Google Brain alexirpan@google.com James Davidson Google Brain jcdavidson@google.com Nicolas Heess Google Deep Mind heess@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code, such as a specific repository link, an explicit code release statement, or code in supplementary materials, for the methodology described.
Open Datasets	Yes	Sequential Permuted MNIST. We use images from the MNIST [19] data set, the pixels of every image by a ﬁxed random permutation, and show them to the model as a sequence of rows. Sequential CIFAR-10. In a similar spirit, we use the CIFAR-10 [17] data set and feed images to the model row by row. Text8 Language Modeling. This text corpus consisting of the ﬁrst 108 bytes of the English Wikipedia is commonly used as a language modeling benchmark for sequential models.
Dataset Splits	Yes	We use the standard split of 60,000 training images and 10,000 testing images. The data set contains 50,000 training images and 10,000 testing images. Following Cooijmans et al. [4], we train on the ﬁrst 90% and evaluate performance on the following 5% of the corpus.
Hardware Specification	No	The paper mentions training durations (e.g., 'The training took about 8 days') but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions optimizers like RMSProp and Adam, but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	For all models, we pick the largest layer sizes such that the number of parameters does not exceed 50,000. Training is performed for 100 epochs on batches of size 50 using RMSProp [30] with a learning rate of 10 3. We train on batches of 100 sequences, each containing 200 bytes, using the Adam optimizer [15] with a default learning rate of 10 3. We scale down gradients exceeding a norm of 1. Models are trained for 50 epochs on batches of size 10 containing sequences of length 50 using RMSProp with a learning rate of 10 3.