reproducibilityindex.ai

Architectural Complexity Measures of Recurrent Neural Networks

Authors: Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Russ R. Salakhutdinov, Yoshua Bengio

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that RNNs might beneﬁt from larger recurrent depth and feedforward depth. We further demonstrate that increasing recurrent skip coefﬁcient offers performance boosts on long term dependency problems. We empirically evaluate models with different recurrent/feedforward depths and recurrent skip coefﬁcients on various sequential modelling tasks. We also show that our experimental results further validate the usefulness of the proposed deﬁnitions.
Researcher Affiliation	Academia	1MILA, Université de Montréal, 2University of Toronto, 3Carnegie Mellon University, 4Institut des Hautes Études Scientiﬁques, France, 5CIFAR
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not include an unambiguous statement or a direct link to the source code for the methodology described. It acknowledges Theano and Keras but does not provide its own code.
Open Datasets	Yes	Penn Treebank dataset: We evaluate our models on character level language modelling using the Penn Treebank dataset [22]. text8 dataset: Another dataset used for character level language modelling is the text8 dataset9, which contains 100M characters from Wikipedia with an alphabet size of 27. adding problem (and the following copying memory problem) was introduced in [10]. copying memory problem: Each input sequence has length of T + 20... sequential MNIST dataset: Each MNIST image data is reshaped into a 784 1 sequence, turning the digit classiﬁcation task into a sequence classiﬁcation one with long-term dependencies [25, 24].
Dataset Splits	Yes	Penn Treebank dataset: It contains 5059k characters for training, 396k for validation and 446k for test, and has a alphabet size of 50.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., specific GPU or CPU models).
Software Dependencies	No	The paper mentions using Adam [26] for optimization, and acknowledges Theano [28] and Keras [29], but does not provide specific version numbers for these or other software components crucial for replication.
Experiment Setup	Yes	For all of our experiments we use Adam [26] for optimization, and conduct a grid search on the learning rate in {10 2, 10 3, 10 4, 10 5}. For tanh RNNs, the parameters are initialized with samples from a uniform distribution. For LSTM networks we adopt a similar initialization scheme, while the forget gate biases are chosen by the grid search on { 5, 3, 1, 0, 1, 3, 5}. We employ early stopping and the batch size was set to 50.