Exploring Sparsity in Recurrent Neural Networks
Authors: Sharan Narang, Greg Diamos, Shubho Sengupta, Erich Elsen
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run all our experiments on a training set of 2100 hours of English speech data and a validation set of 3.5 hours of multi-speaker data. |
| Researcher Affiliation | Industry | Sharan Narang, Greg Diamos, Shubho Sengupta & Erich Elsen Baidu Research {sharan,gdiamos,ssengupta}@baidu.com Now at Google Brain eriche@google.com |
| Pseudocode | Yes | Algorithm 1 Pruning Algorithm |
| Open Source Code | No | The paper does not provide an unambiguous statement or link for the open-source code of their methodology. |
| Open Datasets | No | The paper states, "We run all our experiments on a training set of 2100 hours of English speech data and a validation set of 3.5 hours of multi-speaker data. This is a small subset of the datasets that we use to train our state-of-the-art automatic speech recognition models.", but does not provide any information about public availability or access. |
| Dataset Splits | Yes | We run all our experiments on a training set of 2100 hours of English speech data and a validation set of 3.5 hours of multi-speaker data. |
| Hardware Specification | Yes | The performance benchmark was run using NVIDIA s CUDNN and cu SPARSE libraries on a Titan X Maxwell GPU and compiled using CUDA 7.5. |
| Software Dependencies | Yes | The performance benchmark was run using NVIDIA s CUDNN and cu SPARSE libraries on a Titan X Maxwell GPU and compiled using CUDA 7.5. |
| Experiment Setup | Yes | We train the models using Nesterov SGD for 20 epochs. Besides the hyper-parameters for determining the threshold, all other hyper-parameters remain unchanged between the dense and sparse training runs. In the sparse run, the pruning begins shortly after the first epoch and continues until the 10th epoch. |