Balanced Sparsity for Efficient DNN Inference on GPU
Authors: Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie5676-5683
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity. |
| Researcher Affiliation | Collaboration | Zhuliang Yao,1,4, Shijie Cao,2,4, Wencong Xiao,3,4 Chen Zhang,4 Lanshun Nie2 1Tsinghua University 2Harbin Institute of Technology 3Beihang University 4Microsoft Research Asia {v-zhuyao, v-shicao, v-wencxi, zhac}@microsoft.com, nls@hit.edu.cn |
| Pseudocode | Yes | Algorithm 1: Balance-aware Iterative Pruning Input: The matrix to be pruned, M; The number of blocks per row, Block Num; The expected sparsity, Sparsity; Output: The pruned matrix, Mp; |
| Open Source Code | Yes | Please refer to https://github.com/Howal/balanced-sparsity/blob/master/ appendix-aaai19.pdf for proof. |
| Open Datasets | Yes | PTB dataset (Marcus et al. 1999), Image Net ILSVRC-2012 dataset (Krizhevsky, Sutskever, and Hinton 2012), TIMIT dataset |
| Dataset Splits | Yes | VGG-16... dataset has 1.2M training examples and 50k validation examples. |
| Hardware Specification | No | The paper mentions experiments were run 'on GPU' and refers to 'GPU architecture' and 'GPU inference performance test' but does not specify any particular GPU model (e.g., NVIDIA A100, Tesla V100), CPU, or other hardware specifications. |
| Software Dependencies | No | The paper mentions using 'cu BLAS library', 'cu SPARSE library', and an 'open sourced GPU library (Gray, Radford, and Kingma 2017)' but does not specify version numbers for these software components or any other software dependencies. |
| Experiment Setup | Yes | All the experiments in this section are done with a batch size of 1, the block number per row of our method is 32, and the block size of block sparsity is 8x8, unless explicitly stated. |