Pruning from Scratch
Authors: Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu12273-12280
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments for compressing classification models on CIFAR10 and Image Net datasets, our approach not only greatly reduces the pre-training burden of traditional pruning methods, but also achieves similar or even higher accuracy under the same computation budgets. |
| Researcher Affiliation | Collaboration | 1Tsinghua University, 2Ant Financial, 3Huawei Noah s Ark Lab |
| Pseudocode | Yes | Algorithm 1 Searching For Pruned Structure |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for their method is open-source or publicly available. |
| Open Datasets | Yes | Extensive experiments on CIFAR10 (Krizhevsky and others 2009) and Image Net (Russakovsky et al. 2015) show that our method yields at least 10 and 100 searching speedup while achieving comparable or even better model accuracy than traditional pruning methods using complicated strategies. |
| Dataset Splits | Yes | Specifically, we randomly select 5,000 images from the original CIFAR10 training set for validation. For Image Net, we randomly select 50,000 images (50 images for each category) from the original training set for validation. |
| Hardware Specification | Yes | We measure all model search time on a single NVIDIA Ge Force GTX TITAN Xp GPU. When pruning Res Net56 on the CIFAR10 dataset, NS and AMC take 2.3 hours and 1.0 hours, respectively, and our pipeline only takes 0.12 hours. When pruning Res Net50 on Image Net dataset, NS takes approximately 310 hours (including progressive training process) to complete the entire pruning process. For AMC, although the pruning phase takes about 3.1 hours, a pre-trained full model is required, which is equivalent to about 300 hours of pre-training. Our pipeline takes only 2.8 hours to obtain the pruned structure from a randomly initialized network. These results illustrate the superior pruning speed of our method. We also measure the model CPU latency under batch size 1 on a server with two 2.40GHz Intel(R) Xeon(R) CPU E5-2680 v4. |
| Software Dependencies | No | The paper mentions optimizers like Adam and SGD, and a cosine learning rate scheduler, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | When learning channel importance for the models on CIFAR10 dataset, we use Adam optimizer with an initial learning rate of 0.01 with a batch-size of 128. The balance factor γ = 0.5 and total epoch is 10. All the models are expanded by 1.25 , and the predefined sparsity ratio r equals the percentage of the pruned model s FLOPS to the full model. After searching for the pruned network architecture, we train the pruned model from scratch following the same parameter settings and training schedule in (He et al. 2018a). When learning channel importance for the models on Image Net dataset, we use Adam optimizer with an initial learning rate of 0.01 and a batch-size of 100. The balance factor γ = 0.05 and total epoch is 1. During training, we evaluate the model performance on the validation set multiple times. After finishing the architecture search, we train the pruned model from scratch using SGD optimizer. For Mobile Nets, we use cosine learning rate scheduler (Loshchilov and Hutter 2016) with an initial learning rate of 0.05, momentum of 0.9, weight-decay of 4 10 5. The model is trained for 300 epochs with a batch size of 256. For Res Net50 models, we follow the same hyper-parameter settings in (He et al. 2016). To further improve the performance, we add label smoothing (Szegedy et al. 2016) regularization in the total loss. |