Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Authors: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zerocost neuroregeneration (Gra Net), that advances state of the art. Perhaps most impressively, its sparse-to-sparse version for the first time boosts the sparse-tosparse training performance over various dense-to-sparse methods with Res Net50 on Image Net without extending the training time. We release all codes in https://github.com/Shiweiliuiiiiiii/Gra Net.
Researcher Affiliation Collaboration 1Eindhoven University of Technology, 2University of Texas at Austin 3University of Twente,4University of Leeds,5JD Explore Academy, 6University of Jyväskylä
Pseudocode Yes See Appendix B.1 for the pseudocode of Gra Net.
Open Source Code Yes We release all codes in https://github.com/Shiweiliuiiiiiii/Gra Net.
Open Datasets Yes We choose two commonly used architectures to study pruning plasticity, VGG-19 [58] with batch normalization on CIFAR-10 [27], and Res Net-20 [20] on CIFAR-10. ... Res Net-50 on Image Net
Dataset Splits No The paper specifies training epochs and learning rate schedules in Table 1 but does not explicitly detail validation dataset splits or how validation was performed in the main text. It primarily reports test accuracy.
Hardware Specification Yes All accuracies are in line with the baselines reported in the references [8, 11, 67, 9, 37]. We use standard implementations and hyperparameters available online, with the exception of the small batch size for the Res Net-50 on Image Net due to the limited hardware resources (2 Tesla V100).
Software Dependencies No The paper mentions "reproduced by our implementation with Py Torch" but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes Table 1: Summary of the architectures and hyperparameters we study in this paper. Model Data #Epoch Batch Size LR LR Decay, Epoch Weight Decay Test Accuracy Res Net-20 CIFAR-10 160 128 0.1 (β = 0.9) 10 , [80, 120] 0.0005 92.41 0.04