reproducibilityindex.ai

Recurrent Highway Networks

Authors: Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnı́k, Jürgen Schmidhuber

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Several language modeling experiments demonstrate that the proposed architecture results in powerful and efﬁcient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character.
Researcher Affiliation	Collaboration	1ETH Zürich, Switzerland 2The Swiss AI Lab IDSIA (USI-SUPSI) & NNAISENSE, Switzerland.
Pseudocode	No	The paper provides mathematical equations (6-9) and a schematic illustration (Figure 3) of the RHN computation, but it does not include pseudocode or an algorithm block.
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	On the Penn Treebank corpus (Marcus et al., 1993) preprocessed by Mikolov et al. (2010)... Hutter Prize Wikipedia datasets text8 and enwik8 (Hutter, 2012)... JSB Chorales polyphonic music prediction dataset (Boulanger-Lewandowski et al., 2012).
Dataset Splits	No	The paper mentions using a 'validation set' for Penn Treebank and Wikipedia datasets (e.g., 'For each depth, we show the test set perplexity of the best model based on performance on the validation set'), but it does not specify the exact percentages or counts for the dataset splits used for training, validation, and testing within the paper's main text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using optimization methods like 'SGD-based optimization' and regularization techniques such as 'dropout' and 'weight-tying (WT)', but it does not list any specific software or library names with version numbers that would be necessary for replication.
Experiment Setup	No	The paper states, 'Detailed conﬁgurations for all experiments are included in the supplementary material.' This indicates that the specific experimental setup details, such as concrete hyperparameter values, are not provided within the main text.