Fast-Slow Recurrent Neural Networks
Authors: Asier Mujika, Florian Meier, Angelika Steger
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the FS-RNN on two character level language modeling data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of art results to 1.19 and 1.25 bits-per-character (BPC), respectively. In addition, an ensemble of two FS-RNNs achieves 1.20 BPC on Hutter Prize Wikipedia outperforming the best known compression algorithm with respect to the BPC measure. We also present an empirical investigation of the learning and network dynamics of the FS-RNN, which explains the improved performance compared to other RNN architectures. |
| Researcher Affiliation | Academia | Asier Mujika Department of Computer Science ETH Zürich, Switzerland asierm@ethz.ch Florian Meier Department of Computer Science ETH Zürich, Switzerland meierflo@inf.ethz.ch Angelika Steger Department of Computer Science ETH Zürich, Switzerland steger@inf.ethz.ch |
| Pseudocode | No | The paper provides formal update rules for the FS-RNN architecture using mathematical equations but does not present a separate pseudocode block or algorithm. |
| Open Source Code | Yes | We provide our code in the following URL https://github.com/amujika/Fast-Slow-LSTM. |
| Open Datasets | Yes | The FS-LSTM is evaluated on two character level language modeling data sets, namely Penn Treebank and Hutter Prize Wikipedia, which will be referred to as enwik8 in this section. Penn Treebank [28] The dataset is a collection of Wall Street Journal articles written in English. ... Following [30], we split the data set into train, validation and test sets consisting of 5.1M, 400K and 450K characters, respectively. Hutter Prize Wikipedia [19] This dataset is also known as enwik8 and it consists of "raw" Wikipedia data... Following [7], we split the data set into train, validation and test sets consisting of 90M, 5M and 5M characters, respectively. |
| Dataset Splits | Yes | Following [30], we split the data set into train, validation and test sets consisting of 5.1M, 400K and 450K characters, respectively. (for Penn Treebank) and Following [7], we split the data set into train, validation and test sets consisting of 90M, 5M and 5M characters, respectively. (for enwik8) |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components such as Adam optimizer, dropout, Zoneout, and layer normalization, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Table 3: Hyperparameters for the character-level language model experiments. Includes Non-recurrent dropout, Cell zoneout, Hidden zoneout, Fast cell size, Slow cell size, TBPTT length, Minibatch size, Input embedding size, Initial Learning rate, Epochs with specific values for Penn Treebank and enwik8. |