reproducibilityindex.ai

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

Authors: Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments, conducted on diverse datasets across multiple network backbones, consistently validate our proposal, showing that the hardware acceleration roadblock of LTH is now removed. Specifically, the structural winning tickets obtain up to {64.93%, 64.84%, 60.23%} running time savings at {36% 80%, 74%, 58%} sparsity on {CIFAR, Tiny-Image Net, Image Net}, while maintaining comparable accuracy.
Researcher Affiliation	Academia	1University of Texas at Austin 2Northeastern University. Correspondence to: Zhangyang Wang <atlaswang@utexas.edu>.
Pseudocode	Yes	Algorithm 1 IMP with rewinding step i
Open Source Code	Yes	Code is at https:// github.com/VITA-Group/Structure-LTH.
Open Datasets	Yes	Specifically, we adopt Wide-Res Net-32-2 (Zagoruyko & Komodakis, 2016) (or WRN-32-2), Res Net-18 (He et al., 2016) (or RN-18), Mobile Net-v1 (or MBNet-v1) (Howard et al., 2017), and VGG-16 (Simonyan & Zisserman, 2014) on both CIFAR-10 (Krizhevsky et al., 2009) and CIFAR100 datasets. Res Net-50 (or RN-50) is evaluated on both Tiny-Image Net (Le & Yang, 2015) and Image Net (Deng et al., 2009) datasets.
Dataset Splits	Yes	We also report the best validation accuracy instead of the best test accuracy. The number of training epochs is increased to 240 and the learning rate is decayed at 150-th, 180-th, and 210-th epoch.
Hardware Specification	Yes	The GPU we use for profiling is NVIDIA RTX 2080 TI, with a CUDA version of 10.2 and a cu DNN (Chetlur et al., 2014) version of 7.6.5.
Software Dependencies	Yes	The GPU we use for profiling is NVIDIA RTX 2080 TI, with a CUDA version of 10.2 and a cu DNN (Chetlur et al., 2014) version of 7.6.5.
Experiment Setup	Yes	Table 1. Implementation details which follow the standard settings in Ma et al. (2021b). Batch Size 128 (for most), Weight Decay 1e-4 (for most), Learning Rate 0.1; 0.1 at 80,120 epoch of total 160 epochs (for CIFAR), Optimizer SGD (Ruder, 2016) with a momentum of 0.9.