Accelerating Sparse Convolution with Column Vector-Wise Sparsity

Authors: Yijun Tan, Kai Han, Kang Zhao, Xianzhi Yu, Zidong Du, Yunji Chen, Yunhe Wang, Jun Yao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations demonstrate that our method achieves a 1.7 and 3.2 speedup over the SOTA solution and the dense convolution of ResNet50 on NVIDIA V100 at 75% sparsity, respectively, with only negligible accuracy loss.
Researcher Affiliation Collaboration 1SKL of Processors, Institute of Computing Technology, CAS 2University of Chinese Academy of Sciences 3Huawei Noah s Ark Lab
Pseudocode Yes Algorithm 1: Sparse convolution computation Data: row_idx[], filter[], input[] Result: output[]
Open Source Code No The paper states in its reproducibility checklist that code is included (either in supplemental material or as a URL) but does not provide a specific link or explicit statement in the main body for access to the methodology's source code.
Open Datasets Yes We evaluate our method on several popular CNN models on NVIDIA V100 GPU... Table 1 shows the accuracy of our method compared to unstructured sparsity where V stands for vector length. OVW permuted shows a better accuracy over OVW non-permuted on all CNN models... Table 2 shows results of Resnet50 on Image Net directly copied from Shfl_BW paper...
Dataset Splits No The paper mentions fine-tuning processes and uses standard datasets like Cifar100 and ImageNet, implying standard splits, but does not explicitly provide specific dataset split percentages or sample counts for training, validation, or test sets.
Hardware Specification Yes Experimental evaluations demonstrate that our method achieves a 1.7 and 3.2 speedup over the SOTA solution and the dense convolution of ResNet50 on NVIDIA V100 at 75% sparsity, respectively, with only negligible accuracy loss.
Software Dependencies No The paper mentions software like cuDNN, Mind Spore, and CANN, but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup Yes All the results in this table use the same fine-tuning process. We fine-tune each network for 40 epochs after pruning with the same learning rate of 0.0008. Also, each layer can hold different sparsity ratio credit to our acceleration for convolution at low sparsity ratio.