Dynamic Sparsity Is Channel-Level Sparsity Learner
Authors: Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang "Atlas" Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that Chase achieves 1.7 inference throughput speedup on common GPU devices without compromising accuracy with Res Net-50 on Image Net. We release our codes in https://github.com/luuyin/chase. |
| Researcher Affiliation | Collaboration | 1Eindhoven University of Technology, 2Clemson University 3University of Liverpool, 4JD Explore Academy, 5University of Texas at Austin |
| Pseudocode | Yes | Algorithm 1: Pseudocode of Chase. Algorithm 2: Overview of Global Parameter Exploration |
| Open Source Code | Yes | We release our codes in https://github.com/luuyin/chase. |
| Open Datasets | Yes | Our evaluation is conducted with two widely used model architectures VGG-19 [48] and Res Net-50 [15] on across various datasets including CIFAR-10/100 and Image Net |
| Dataset Splits | Yes | Our evaluation is conducted with two widely used model architectures VGG-19 [48] and Res Net-50 [15] on across various datasets including CIFAR-10/100 and Image Net |
| Hardware Specification | Yes | All results are averaged from 100 individual runs with one NVIDIA 2080TI GPU in float32 on Py Torch. We set the batch size to 128 for CIFAR-100 and 2 for Image Net, when evaluating the latency. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not specify its version number. No other specific software dependencies with version numbers are provided. |
| Experiment Setup | Yes | Table 9: Implementation hyperparameters of Chase in Table 3, on CIFAR-10/100. Table 10: Implementation hyperparameters of Chase in Table 4, on Image Net. These tables detail hyperparameters such as τtotal (epochs), τstop (epochs), T (iterations), Tp (iterations), β, BS (batch size), LR (learning rate), LR Drop, Optimizer, Momentum, WD (weight decay), and Sparse Init. |