An Empirical Exploration of Recurrent Network Architectures

Authors: Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.
Researcher Affiliation Collaboration Rafal Jozefowicz RAFALJ@GOOGLE.COM Google Inc. Wojciech Zaremba WOJ.ZAREMBA@GMAIL.COM New York University, Facebook Ilya Sutskever ILYASU@GOOGLE.COM Google Inc.
Pseudocode No The paper presents mathematical equations for RNN architectures and describes mutation rules in a list, but it does not contain structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code No The paper does not provide any concrete access to source code (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes Penn Tree-Bank (PTB). We also included a word-level language modelling task on the Penn Tree Bank (Marcus et al., 1993)... and Music. We used the polyphonic music datasets from Boulanger-Lewandowski et al. (2012). We evaluated the Nottingham and the Piano-midi datasets.
Dataset Splits Yes Penn Tree-Bank (PTB). We also included a word-level language modelling task on the Penn Tree Bank (Marcus et al., 1993) following the precise setup of Mikolov et al. (2010), which has 1M words with a vocabulary of size 10,000. and If we would observe no improvement in three consecutive epochs on the validation set, we would start lowering the learning rate by a factor of 2 at each epoch, for four additional epochs.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using Theano and Torch but does not provide specific version numbers for these or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes For all problems, we used a minibatch of size 20, and unrolled the RNNs for 35 timesteps. and We used the following ranges for the hyperparameter search. The initialization scale is in {0.3, 0.7, 1, 1.4, 2, 2.8}... The learning rate was chosen from {0.1, 0.2, 0.3, 0.5, 1, 2, 5}... The maximal permissible norm of the gradient was set to {1, 2.5, 5, 10, 20}... The number of layers was chosen from {1, 2, 3, 4}... and dropout in {0.0, 0.1, 0.3, 0.5}