reproducibilityindex.ai

Language Modeling with Recurrent Highway Hypernetworks

Authors: Joseph Suarez

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present extensive experimental and theoretical support for the efﬁcacy of recurrent highway networks (RHNs) and recurrent hypernetworks complimentary to the original works. Where the original RHN work primarily provides theoretical treatment of the subject, we demonstrate experimentally that RHNs beneﬁt from far better gradient ﬂow than LSTMs in addition to their improved task accuracy. The original hypernetworks work presents detailed experimental results but leaves several theoretical issues unresolved we consider these in depth and frame several feasible solutions that we believe will yield further gains in the future. We demonstrate that these approaches are complementary: by combining RHNs and hypernetworks, we make a signiﬁcant improvement over current state-of-the-art character-level language modeling performance on Penn Treebank while relying on much simpler regularization. Finally, we argue for RHNs as a drop-in replacement for LSTMs (analogous to LSTMs for vanilla RNNs) and for hypernetworks as a de-facto augmentation (analogous to attention) for recurrent architectures.
Researcher Affiliation	Academia	Joseph Suarez Stanford University joseph15@stanford.edu
Pseudocode	No	The paper presents mathematical equations for recurrent cells but does not include a block explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	open sourcing (code 3) a combined architecture that obtains SOTA on PTB... 3 github.com/jsuarez5341/Recurrent-Highway-Hypernetworks-NIPS
Open Datasets	Yes	Penn Treebank (PTB) contains approximately 5.10M/0.40M/0.45M characters in the train/val/test sets respectively... [16] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313 330, 1993.
Dataset Splits	Yes	Penn Treebank (PTB) contains approximately 5.10M/0.40M/0.45M characters in the train/val/test sets respectively
Hardware Specification	Yes	on a single GTX 1080 Ti
Software Dependencies	No	The paper mentions using 'Adam' optimizer but does not specify any software library versions (e.g., PyTorch 1.x, TensorFlow 2.x) required for replication.
Experiment Setup	Yes	We train all models using Adam [20] with the default learning rate 0.001 and sequence length 100, batch size 256... Both subnetworks use a recurrent dropout keep probability of 0.65 and no other regularizer/normalizer.