Dilated Recurrent Neural Networks

Authors: Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark A. Hasegawa-Johnson, Thomas S. Huang

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate the DILATEDRNN in multiple RNN settings on a variety of sequential learning tasks, including long-term memorization, pixel-by-pixel classification of handwritten digits (with permutation and noise), character-level language modeling, and speaker identification with raw audio waveforms.
Researcher Affiliation Collaboration 1IBM Thomas J. Watson Research Center, Yorktown, NY 10598, USA 2University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Pseudocode No The paper describes the architecture and processes using mathematical equations and textual explanations, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code for our method is publicly available1. 1https://github.com/code-terminator/Dilated RNN
Open Datasets Yes We empirically validate the DILATEDRNN in multiple RNN settings on a variety of sequential learning tasks, including long-term memorization, pixel-by-pixel classification of handwritten digits (with permutation and noise), character-level language modeling on the Penn Treebank [16], and speaker identification with raw audio waveforms on VCTK [26].
Dataset Splits Yes Training, validation and testing sets are the default ones in Tensorflow. Hyperparameters and results are reported in table 1. [...] Results are reported for trained models that achieve the best validation loss.
Hardware Specification Yes Notably, the model with dilation starting at 64 is able to train within 17 minutes by using a single Nvidia P-100 GPU while maintaining a 93.5% test accuracy.
Software Dependencies No Unless specified otherwise, all the models are implemented with Tensorflow [1]. No specific version number for TensorFlow or any other software dependency is provided.
Experiment Setup Yes Unless specified otherwise, all the models are implemented with Tensorflow [1]. We use the default nonlinearities and RMSProp optimizer [21] with learning rate 0.001 and decay rate of 0.9. All weight matrices are initialized by the standard normal distribution. The batch size is set to 128. Furthermore, in all the experiments, we apply the sequence classification setting [25], where the output layer only adds at the end of the sequence. Results are reported for trained models that achieve the best validation loss.