Exploring Sparsity in Recurrent Neural Networks

Authors: Sharan Narang, Greg Diamos, Shubho Sengupta, Erich Elsen

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run all our experiments on a training set of 2100 hours of English speech data and a validation set of 3.5 hours of multi-speaker data.
Researcher Affiliation Industry Sharan Narang, Greg Diamos, Shubho Sengupta & Erich Elsen Baidu Research {sharan,gdiamos,ssengupta}@baidu.com Now at Google Brain eriche@google.com
Pseudocode Yes Algorithm 1 Pruning Algorithm
Open Source Code No The paper does not provide an unambiguous statement or link for the open-source code of their methodology.
Open Datasets No The paper states, "We run all our experiments on a training set of 2100 hours of English speech data and a validation set of 3.5 hours of multi-speaker data. This is a small subset of the datasets that we use to train our state-of-the-art automatic speech recognition models.", but does not provide any information about public availability or access.
Dataset Splits Yes We run all our experiments on a training set of 2100 hours of English speech data and a validation set of 3.5 hours of multi-speaker data.
Hardware Specification Yes The performance benchmark was run using NVIDIA s CUDNN and cu SPARSE libraries on a Titan X Maxwell GPU and compiled using CUDA 7.5.
Software Dependencies Yes The performance benchmark was run using NVIDIA s CUDNN and cu SPARSE libraries on a Titan X Maxwell GPU and compiled using CUDA 7.5.
Experiment Setup Yes We train the models using Nesterov SGD for 20 epochs. Besides the hyper-parameters for determining the threshold, all other hyper-parameters remain unchanged between the dense and sparse training runs. In the sparse run, the pruning begins shortly after the first epoch and continues until the 10th epoch.