Fast-Slow Recurrent Neural Networks

Authors: Asier Mujika, Florian Meier, Angelika Steger

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the FS-RNN on two character level language modeling data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of art results to 1.19 and 1.25 bits-per-character (BPC), respectively. In addition, an ensemble of two FS-RNNs achieves 1.20 BPC on Hutter Prize Wikipedia outperforming the best known compression algorithm with respect to the BPC measure. We also present an empirical investigation of the learning and network dynamics of the FS-RNN, which explains the improved performance compared to other RNN architectures.
Researcher Affiliation Academia Asier Mujika Department of Computer Science ETH Zürich, Switzerland asierm@ethz.ch Florian Meier Department of Computer Science ETH Zürich, Switzerland meierflo@inf.ethz.ch Angelika Steger Department of Computer Science ETH Zürich, Switzerland steger@inf.ethz.ch
Pseudocode No The paper provides formal update rules for the FS-RNN architecture using mathematical equations but does not present a separate pseudocode block or algorithm.
Open Source Code Yes We provide our code in the following URL https://github.com/amujika/Fast-Slow-LSTM.
Open Datasets Yes The FS-LSTM is evaluated on two character level language modeling data sets, namely Penn Treebank and Hutter Prize Wikipedia, which will be referred to as enwik8 in this section. Penn Treebank [28] The dataset is a collection of Wall Street Journal articles written in English. ... Following [30], we split the data set into train, validation and test sets consisting of 5.1M, 400K and 450K characters, respectively. Hutter Prize Wikipedia [19] This dataset is also known as enwik8 and it consists of "raw" Wikipedia data... Following [7], we split the data set into train, validation and test sets consisting of 90M, 5M and 5M characters, respectively.
Dataset Splits Yes Following [30], we split the data set into train, validation and test sets consisting of 5.1M, 400K and 450K characters, respectively. (for Penn Treebank) and Following [7], we split the data set into train, validation and test sets consisting of 90M, 5M and 5M characters, respectively. (for enwik8)
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software components such as Adam optimizer, dropout, Zoneout, and layer normalization, but does not provide specific version numbers for any of them.
Experiment Setup Yes Table 3: Hyperparameters for the character-level language model experiments. Includes Non-recurrent dropout, Cell zoneout, Hidden zoneout, Fast cell size, Slow cell size, TBPTT length, Minibatch size, Input embedding size, Initial Learning rate, Epochs with specific values for Penn Treebank and enwik8.