Optimization based Layer-wise Magnitude-based Pruning for DNN Compression
Authors: Guiying Li, Chao Qian, Chunhui Jiang, Xiaofen Lu, Ke Tang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that OLMP can achieve the best pruning ratio on Le Net-style models (i.e., 114 times for Le Net-300-100 and 298 times for Le Net-5) compared with some state-of-the-art DNN pruning methods, and can reduce the size of an Alex Net-style network up to 82 times without accuracy loss. |
| Researcher Affiliation | Academia | Guiying Li1, Chao Qian1, Chunhui Jiang1, Xiaofen Lu2, Ke Tang3 1 Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, Hefei 230027, China 2 CERCIA, School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK 3 Shenzhen Key Lab of Computational Intelligence, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China |
| Pseudocode | No | The paper describes the proposed method conceptually and with flowcharts (Figure 1), but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about the release of source code for the methodology described. |
| Open Datasets | Yes | For the data sets and models, Le Net-5 and Le Net-300-100 are trained on MNIST, and Alex Net-Caltech is trained on Caltech-256 [Griffin et al., 2006]. |
| Dataset Splits | Yes | For MNIST, a validation set containing 6,000 samples is selected from the training set (60,000 in total) uniformly at random; the remaining samples form the new training set; the test set is untouched (10,000 in total). For Caltech-256, the training set is untouched (15,420 in total) while the validation set (15,187 in total) is uniformly ran- domly divided into a new validation set (7,530 in total) and a new test set (7,657 in total). |
| Hardware Specification | Yes | All of the experiments are based on Caffe [Jia et al., 2014] and released projects of DS and NCS, and run on a workstation with one Titan X pascal and dual Intel E5-2683 v3@2.0 GHz CPUs. |
| Software Dependencies | No | The paper mentions that experiments are based on 'Caffe' and 'released projects of DS and NCS', but it does not specify any version numbers for these software components. |
| Experiment Setup | Yes | In this experiment, we set the hyper-parameters (K, pruning loops, pop N, Tmax) to (1000, 15, 10, 160). For Le Net-300-100, (δ, σ) is set to (8%, 5); for Le Net-5, δ is set to 5%, and σ is set to 5 in the first 10 pruning loops and 0.5 since then. The reference model is trained from scratch for 10,000 iterations using SGD which mainly follows the experimental settings in [Zeiler and Fergus, 2014] except that the batch size is 256 here; and the same settings of SGD are also used during retraining. (K, pruning loops, pop N, Tmax, δ) is set to (250, 40 , 8, 200, 8%), and σ is set to 5 in the first 28 pruning loops and 0.5 since then. |