Neural Pruning via Growing Regularization

Authors: Huan Wang, Can Qin, Yulun Zhang, Yun Fu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTAL RESULTS Datasets and networks. We first conduct analyses on the CIFAR10/100 datasets Krizhevsky (2009) with Res Net56 He et al. (2016)/VGG19 Simonyan & Zisserman (2015). Then we evaluate our methods on the large-scale Image Net dataset Deng et al. (2009) with Res Net34 and 50 He et al. (2016). For CIFAR datasets, we train our baseline models with accuracies comparable to those in the original papers. For Image Net, we take the official Py Torch Paszke et al. (2019) pre-trained models2 as baseline to maintain comparability with other methods.
Researcher Affiliation Academia Huan Wang, Can Qin, Yulun Zhang , Yun Fu Northeastern University, Boston, MA, USA {wang.huan, qin.ca}@northeastern.edu, yulun100@gmail.com, yunfu@ece.neu.edu
Pseudocode Yes Algorithm 1 GReg-1 and GReg-2 Algorithms
Open Source Code Yes Our code and trained models are publicly available at https://github.com/mingsuntse/regularization-pruning.
Open Datasets Yes Datasets and networks. We first conduct analyses on the CIFAR10/100 datasets Krizhevsky (2009) with Res Net56 He et al. (2016)/VGG19 Simonyan & Zisserman (2015). Then we evaluate our methods on the large-scale Image Net dataset Deng et al. (2009) with Res Net34 and 50 He et al. (2016).
Dataset Splits Yes Datasets and networks. We first conduct analyses on the CIFAR10/100 datasets Krizhevsky (2009) with Res Net56 He et al. (2016)/VGG19 Simonyan & Zisserman (2015). Then we evaluate our methods on the large-scale Image Net dataset Deng et al. (2009) with Res Net34 and 50 He et al. (2016). For Image Net, we take the official Py Torch Paszke et al. (2019) pre-trained models2 as baseline to maintain comparability with other methods.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Py Torch Paszke et al. (2019)' but does not provide specific version numbers for PyTorch or any other software dependencies used in the experiments.
Experiment Setup Yes Training settings. To control the irrelevant factors as we can, for comparison methods that release their pruning ratios, we will adopt their ratios; otherwise, we will use our specified ones. We compare the speedup (measured by FLOPs reduction) since we mainly target model acceleration rather than compression. Detailed training settings (e.g., hyper-parameters and layer pruning ratios) are summarized in the Appendix.Table 5: Training setting summary. For the SGD solver, in the parentheses are the momentum and weight decay. For Image Net, batch size 64 is used for pruning instead of the standard 256, which is because we want to save the training time.Table 8: Hyper-parameters of our methods.