Recurrent Highway Networks
Authors: Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnı́k, Jürgen Schmidhuber
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Several language modeling experiments demonstrate that the proposed architecture results in powerful and efficient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character. |
| Researcher Affiliation | Collaboration | 1ETH Zürich, Switzerland 2The Swiss AI Lab IDSIA (USI-SUPSI) & NNAISENSE, Switzerland. |
| Pseudocode | No | The paper provides mathematical equations (6-9) and a schematic illustration (Figure 3) of the RHN computation, but it does not include pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | On the Penn Treebank corpus (Marcus et al., 1993) preprocessed by Mikolov et al. (2010)... Hutter Prize Wikipedia datasets text8 and enwik8 (Hutter, 2012)... JSB Chorales polyphonic music prediction dataset (Boulanger-Lewandowski et al., 2012). |
| Dataset Splits | No | The paper mentions using a 'validation set' for Penn Treebank and Wikipedia datasets (e.g., 'For each depth, we show the test set perplexity of the best model based on performance on the validation set'), but it does not specify the exact percentages or counts for the dataset splits used for training, validation, and testing within the paper's main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using optimization methods like 'SGD-based optimization' and regularization techniques such as 'dropout' and 'weight-tying (WT)', but it does not list any specific software or library names with version numbers that would be necessary for replication. |
| Experiment Setup | No | The paper states, 'Detailed configurations for all experiments are included in the supplementary material.' This indicates that the specific experimental setup details, such as concrete hyperparameter values, are not provided within the main text. |