Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Authors: Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments, conducted on diverse datasets across multiple network backbones, consistently validate our proposal, showing that the hardware acceleration roadblock of LTH is now removed. Specifically, the structural winning tickets obtain up to {64.93%, 64.84%, 60.23%} running time savings at {36% 80%, 74%, 58%} sparsity on {CIFAR, Tiny-Image Net, Image Net}, while maintaining comparable accuracy.
Researcher Affiliation Academia 1University of Texas at Austin 2Northeastern University. Correspondence to: Zhangyang Wang <atlaswang@utexas.edu>.
Pseudocode Yes Algorithm 1 IMP with rewinding step i
Open Source Code Yes Code is at https:// github.com/VITA-Group/Structure-LTH.
Open Datasets Yes Specifically, we adopt Wide-Res Net-32-2 (Zagoruyko & Komodakis, 2016) (or WRN-32-2), Res Net-18 (He et al., 2016) (or RN-18), Mobile Net-v1 (or MBNet-v1) (Howard et al., 2017), and VGG-16 (Simonyan & Zisserman, 2014) on both CIFAR-10 (Krizhevsky et al., 2009) and CIFAR100 datasets. Res Net-50 (or RN-50) is evaluated on both Tiny-Image Net (Le & Yang, 2015) and Image Net (Deng et al., 2009) datasets.
Dataset Splits Yes We also report the best validation accuracy instead of the best test accuracy. The number of training epochs is increased to 240 and the learning rate is decayed at 150-th, 180-th, and 210-th epoch.
Hardware Specification Yes The GPU we use for profiling is NVIDIA RTX 2080 TI, with a CUDA version of 10.2 and a cu DNN (Chetlur et al., 2014) version of 7.6.5.
Software Dependencies Yes The GPU we use for profiling is NVIDIA RTX 2080 TI, with a CUDA version of 10.2 and a cu DNN (Chetlur et al., 2014) version of 7.6.5.
Experiment Setup Yes Table 1. Implementation details which follow the standard settings in Ma et al. (2021b). Batch Size 128 (for most), Weight Decay 1e-4 (for most), Learning Rate 0.1; 0.1 at 80,120 epoch of total 160 epochs (for CIFAR), Optimizer SGD (Ruder, 2016) with a momentum of 0.9.