An Empirical Exploration of Recurrent Network Architectures
Authors: Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. |
| Researcher Affiliation | Collaboration | Rafal Jozefowicz RAFALJ@GOOGLE.COM Google Inc. Wojciech Zaremba WOJ.ZAREMBA@GMAIL.COM New York University, Facebook Ilya Sutskever ILYASU@GOOGLE.COM Google Inc. |
| Pseudocode | No | The paper presents mathematical equations for RNN architectures and describes mutation rules in a list, but it does not contain structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | Penn Tree-Bank (PTB). We also included a word-level language modelling task on the Penn Tree Bank (Marcus et al., 1993)... and Music. We used the polyphonic music datasets from Boulanger-Lewandowski et al. (2012). We evaluated the Nottingham and the Piano-midi datasets. |
| Dataset Splits | Yes | Penn Tree-Bank (PTB). We also included a word-level language modelling task on the Penn Tree Bank (Marcus et al., 1993) following the precise setup of Mikolov et al. (2010), which has 1M words with a vocabulary of size 10,000. and If we would observe no improvement in three consecutive epochs on the validation set, we would start lowering the learning rate by a factor of 2 at each epoch, for four additional epochs. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Theano and Torch but does not provide specific version numbers for these or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For all problems, we used a minibatch of size 20, and unrolled the RNNs for 35 timesteps. and We used the following ranges for the hyperparameter search. The initialization scale is in {0.3, 0.7, 1, 1.4, 2, 2.8}... The learning rate was chosen from {0.1, 0.2, 0.3, 0.5, 1, 2, 5}... The maximal permissible norm of the gradient was set to {1, 2.5, 5, 10, 20}... The number of layers was chosen from {1, 2, 3, 4}... and dropout in {0.0, 0.1, 0.3, 0.5} |