Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
Authors: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zerocost neuroregeneration (Gra Net), that advances state of the art. Perhaps most impressively, its sparse-to-sparse version for the first time boosts the sparse-tosparse training performance over various dense-to-sparse methods with Res Net50 on Image Net without extending the training time. We release all codes in https://github.com/Shiweiliuiiiiiii/Gra Net. |
| Researcher Affiliation | Collaboration | 1Eindhoven University of Technology, 2University of Texas at Austin 3University of Twente,4University of Leeds,5JD Explore Academy, 6University of Jyväskylä |
| Pseudocode | Yes | See Appendix B.1 for the pseudocode of Gra Net. |
| Open Source Code | Yes | We release all codes in https://github.com/Shiweiliuiiiiiii/Gra Net. |
| Open Datasets | Yes | We choose two commonly used architectures to study pruning plasticity, VGG-19 [58] with batch normalization on CIFAR-10 [27], and Res Net-20 [20] on CIFAR-10. ... Res Net-50 on Image Net |
| Dataset Splits | No | The paper specifies training epochs and learning rate schedules in Table 1 but does not explicitly detail validation dataset splits or how validation was performed in the main text. It primarily reports test accuracy. |
| Hardware Specification | Yes | All accuracies are in line with the baselines reported in the references [8, 11, 67, 9, 37]. We use standard implementations and hyperparameters available online, with the exception of the small batch size for the Res Net-50 on Image Net due to the limited hardware resources (2 Tesla V100). |
| Software Dependencies | No | The paper mentions "reproduced by our implementation with Py Torch" but does not specify the version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Table 1: Summary of the architectures and hyperparameters we study in this paper. Model Data #Epoch Batch Size LR LR Decay, Epoch Weight Decay Test Accuracy Res Net-20 CIFAR-10 160 128 0.1 (β = 0.9) 10 , [80, 120] 0.0005 92.41 0.04 |