Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Authors: Xiaohan Ding, guiguang ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, Ji Liu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate GSM by pruning several common benchmark models on MNIST, CIFAR-10 [29] and Image Net [9], and comparing with the reported results from several recent competitors. For each trial, we start from a well-trained base model and apply GSM training on all the layers simultaneously. |
| Researcher Affiliation | Collaboration | 1 Beijing National Research Center for Information Science and Technology (BNRist); School of Software, Tsinghua University, Beijing, China 2 Department of Electronic Engineering, Tsinghua University, Beijing, China 3 Department of Automation, Tsinghua University; Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China 4 WMG Data Science, University of Warwick, Coventry, United Kingdom 5 Kwai Seattle AI Lab, Kwai Fe DA Lab, Kwai AI platform |
| Pseudocode | No | The paper provides mathematical formulations (e.g., equations 1, 9, 10) for its update rules but does not include a distinct pseudocode or algorithm block. |
| Open Source Code | Yes | The codes are available at https://github.com/Ding Xiao H/GSM-SGD. |
| Open Datasets | Yes | We evaluate GSM by pruning several common benchmark models on MNIST, CIFAR-10 [29] and Image Net [9] |
| Dataset Splits | Yes | After GSM training, we conduct lossless pruning and test on the validation dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | For MNIST: 'We use momentum coefficient β = 0.99 and a batch size of 256. The learning rate schedule is α = 3 10 2, 3 10 3, 3 10 4 for 160, 40 and 40 epochs, respectively.' For CIFAR-10: 'We use β = 0.98, a batch size of 64 and learning rate α = 5 10 3, 5 10 4, 5 10 5 for 400, 100 and 100 epochs, respectively.' |