Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
An Empirical Exploration of Recurrent Network Architectures
Authors: Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever
ICML 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. |
| Researcher Affiliation | Collaboration | Rafal Jozefowicz EMAIL Google Inc. Wojciech Zaremba EMAIL New York University, Facebook Ilya Sutskever EMAIL Google Inc. |
| Pseudocode | No | The paper presents mathematical equations for RNN architectures and describes mutation rules in a list, but it does not contain structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | Penn Tree-Bank (PTB). We also included a word-level language modelling task on the Penn Tree Bank (Marcus et al., 1993)... and Music. We used the polyphonic music datasets from Boulanger-Lewandowski et al. (2012). We evaluated the Nottingham and the Piano-midi datasets. |
| Dataset Splits | Yes | Penn Tree-Bank (PTB). We also included a word-level language modelling task on the Penn Tree Bank (Marcus et al., 1993) following the precise setup of Mikolov et al. (2010), which has 1M words with a vocabulary of size 10,000. and If we would observe no improvement in three consecutive epochs on the validation set, we would start lowering the learning rate by a factor of 2 at each epoch, for four additional epochs. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Theano and Torch but does not provide specific version numbers for these or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For all problems, we used a minibatch of size 20, and unrolled the RNNs for 35 timesteps. and We used the following ranges for the hyperparameter search. The initialization scale is in {0.3, 0.7, 1, 1.4, 2, 2.8}... The learning rate was chosen from {0.1, 0.2, 0.3, 0.5, 1, 2, 5}... The maximal permissible norm of the gradient was set to {1, 2.5, 5, 10, 20}... The number of layers was chosen from {1, 2, 3, 4}... and dropout in {0.0, 0.1, 0.3, 0.5} |