Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets
Authors: Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, conducted on diverse datasets across multiple network backbones, consistently validate our proposal, showing that the hardware acceleration roadblock of LTH is now removed. Specifically, the structural winning tickets obtain up to {64.93%, 64.84%, 60.23%} running time savings at {36% 80%, 74%, 58%} sparsity on {CIFAR, Tiny-Image Net, Image Net}, while maintaining comparable accuracy. |
| Researcher Affiliation | Academia | 1University of Texas at Austin 2Northeastern University. Correspondence to: Zhangyang Wang <atlaswang@utexas.edu>. |
| Pseudocode | Yes | Algorithm 1 IMP with rewinding step i |
| Open Source Code | Yes | Code is at https:// github.com/VITA-Group/Structure-LTH. |
| Open Datasets | Yes | Specifically, we adopt Wide-Res Net-32-2 (Zagoruyko & Komodakis, 2016) (or WRN-32-2), Res Net-18 (He et al., 2016) (or RN-18), Mobile Net-v1 (or MBNet-v1) (Howard et al., 2017), and VGG-16 (Simonyan & Zisserman, 2014) on both CIFAR-10 (Krizhevsky et al., 2009) and CIFAR100 datasets. Res Net-50 (or RN-50) is evaluated on both Tiny-Image Net (Le & Yang, 2015) and Image Net (Deng et al., 2009) datasets. |
| Dataset Splits | Yes | We also report the best validation accuracy instead of the best test accuracy. The number of training epochs is increased to 240 and the learning rate is decayed at 150-th, 180-th, and 210-th epoch. |
| Hardware Specification | Yes | The GPU we use for profiling is NVIDIA RTX 2080 TI, with a CUDA version of 10.2 and a cu DNN (Chetlur et al., 2014) version of 7.6.5. |
| Software Dependencies | Yes | The GPU we use for profiling is NVIDIA RTX 2080 TI, with a CUDA version of 10.2 and a cu DNN (Chetlur et al., 2014) version of 7.6.5. |
| Experiment Setup | Yes | Table 1. Implementation details which follow the standard settings in Ma et al. (2021b). Batch Size 128 (for most), Weight Decay 1e-4 (for most), Learning Rate 0.1; 0.1 at 80,120 epoch of total 160 epochs (for CIFAR), Optimizer SGD (Ruder, 2016) with a momentum of 0.9. |