Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Authors: Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve speedups of over 6 over the next best algorithm for a hidden layer of size 2304, batch size of 4, and a density of 30%. Further, our technique allows for models of over 5 the size to fit on a GPU for a speedup of 2 , enabling larger networks to help advance the state-of-the-art. We perform case studies on NMT and speech recognition tasks in the appendix, accelerating their recurrent layers by up to 3 . |
| Researcher Affiliation | Industry | Feiwen Zhu , Jeff Pool , Michael Andersch, Jeremy Appleyard & Fung Xie NVIDIA EMAIL |
| Pseudocode | Yes | APPENDIX A: ALGORITHM FOR BANK-AWARE WEIGHT LAYOUT... Algorithm 1: Optimize a row of nonzero weights to minimize bank conflicts |
| Open Source Code | No | The paper describes its methods and algorithms but does not include any explicit statement about making its source code publicly available or provide a repository link for the methodology described. |
| Open Datasets | Yes | We use Open NMT (Klein et al., 2017) to perform translation from English to German using the WMT15 data set as our training data and the newstest2013 data set for validation. |
| Dataset Splits | Yes | We use Open NMT (Klein et al., 2017) to perform translation from English to German using the WMT15 data set as our training data and the newstest2013 data set for validation. |
| Hardware Specification | Yes | Our sparse persistent code is compiled in CUDA 9.0, and all tests are run on a NVIDIA Tesla V100. |
| Software Dependencies | Yes | Our sparse persistent code is compiled in CUDA 9.0, and all tests are run on a NVIDIA Tesla V100. |
| Experiment Setup | Yes | Table 1: A naïve implementation has limited performance; our optimizations are necessary to achieve good results. (Layer size = 1152, batch size = 4, density = 10%, #timesteps = 256.) |