Dilated Recurrent Neural Networks
Authors: Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark A. Hasegawa-Johnson, Thomas S. Huang
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate the DILATEDRNN in multiple RNN settings on a variety of sequential learning tasks, including long-term memorization, pixel-by-pixel classification of handwritten digits (with permutation and noise), character-level language modeling, and speaker identification with raw audio waveforms. |
| Researcher Affiliation | Collaboration | 1IBM Thomas J. Watson Research Center, Yorktown, NY 10598, USA 2University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA |
| Pseudocode | No | The paper describes the architecture and processes using mathematical equations and textual explanations, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for our method is publicly available1. 1https://github.com/code-terminator/Dilated RNN |
| Open Datasets | Yes | We empirically validate the DILATEDRNN in multiple RNN settings on a variety of sequential learning tasks, including long-term memorization, pixel-by-pixel classification of handwritten digits (with permutation and noise), character-level language modeling on the Penn Treebank [16], and speaker identification with raw audio waveforms on VCTK [26]. |
| Dataset Splits | Yes | Training, validation and testing sets are the default ones in Tensorflow. Hyperparameters and results are reported in table 1. [...] Results are reported for trained models that achieve the best validation loss. |
| Hardware Specification | Yes | Notably, the model with dilation starting at 64 is able to train within 17 minutes by using a single Nvidia P-100 GPU while maintaining a 93.5% test accuracy. |
| Software Dependencies | No | Unless specified otherwise, all the models are implemented with Tensorflow [1]. No specific version number for TensorFlow or any other software dependency is provided. |
| Experiment Setup | Yes | Unless specified otherwise, all the models are implemented with Tensorflow [1]. We use the default nonlinearities and RMSProp optimizer [21] with learning rate 0.001 and decay rate of 0.9. All weight matrices are initialized by the standard normal distribution. The batch size is set to 128. Furthermore, in all the experiments, we apply the sequence classification setting [25], where the output layer only adds at the end of the sequence. Results are reported for trained models that achieve the best validation loss. |