reproducibilityindex.ai

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Authors: Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve speedups of over 6 over the next best algorithm for a hidden layer of size 2304, batch size of 4, and a density of 30%. Further, our technique allows for models of over 5 the size to fit on a GPU for a speedup of 2 , enabling larger networks to help advance the state-of-the-art. We perform case studies on NMT and speech recognition tasks in the appendix, accelerating their recurrent layers by up to 3 .
Researcher Affiliation	Industry	Feiwen Zhu , Jeff Pool , Michael Andersch, Jeremy Appleyard & Fung Xie NVIDIA {mzhu,jpool,mandersch,jappleyard,ftse}@nvidia.com
Pseudocode	Yes	APPENDIX A: ALGORITHM FOR BANK-AWARE WEIGHT LAYOUT... Algorithm 1: Optimize a row of nonzero weights to minimize bank conﬂicts
Open Source Code	No	The paper describes its methods and algorithms but does not include any explicit statement about making its source code publicly available or provide a repository link for the methodology described.
Open Datasets	Yes	We use Open NMT (Klein et al., 2017) to perform translation from English to German using the WMT15 data set as our training data and the newstest2013 data set for validation.
Dataset Splits	Yes	We use Open NMT (Klein et al., 2017) to perform translation from English to German using the WMT15 data set as our training data and the newstest2013 data set for validation.
Hardware Specification	Yes	Our sparse persistent code is compiled in CUDA 9.0, and all tests are run on a NVIDIA Tesla V100.
Software Dependencies	Yes	Our sparse persistent code is compiled in CUDA 9.0, and all tests are run on a NVIDIA Tesla V100.
Experiment Setup	Yes	Table 1: A naïve implementation has limited performance; our optimizations are necessary to achieve good results. (Layer size = 1152, batch size = 4, density = 10%, #timesteps = 256.)