Optimization based Layer-wise Magnitude-based Pruning for DNN Compression

Authors: Guiying Li, Chao Qian, Chunhui Jiang, Xiaofen Lu, Ke Tang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that OLMP can achieve the best pruning ratio on Le Net-style models (i.e., 114 times for Le Net-300-100 and 298 times for Le Net-5) compared with some state-of-the-art DNN pruning methods, and can reduce the size of an Alex Net-style network up to 82 times without accuracy loss.
Researcher Affiliation Academia Guiying Li1, Chao Qian1, Chunhui Jiang1, Xiaofen Lu2, Ke Tang3 1 Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, Hefei 230027, China 2 CERCIA, School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK 3 Shenzhen Key Lab of Computational Intelligence, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
Pseudocode No The paper describes the proposed method conceptually and with flowcharts (Figure 1), but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide a specific repository link or an explicit statement about the release of source code for the methodology described.
Open Datasets Yes For the data sets and models, Le Net-5 and Le Net-300-100 are trained on MNIST, and Alex Net-Caltech is trained on Caltech-256 [Griffin et al., 2006].
Dataset Splits Yes For MNIST, a validation set containing 6,000 samples is selected from the training set (60,000 in total) uniformly at random; the remaining samples form the new training set; the test set is untouched (10,000 in total). For Caltech-256, the training set is untouched (15,420 in total) while the validation set (15,187 in total) is uniformly ran- domly divided into a new validation set (7,530 in total) and a new test set (7,657 in total).
Hardware Specification Yes All of the experiments are based on Caffe [Jia et al., 2014] and released projects of DS and NCS, and run on a workstation with one Titan X pascal and dual Intel E5-2683 v3@2.0 GHz CPUs.
Software Dependencies No The paper mentions that experiments are based on 'Caffe' and 'released projects of DS and NCS', but it does not specify any version numbers for these software components.
Experiment Setup Yes In this experiment, we set the hyper-parameters (K, pruning loops, pop N, Tmax) to (1000, 15, 10, 160). For Le Net-300-100, (δ, σ) is set to (8%, 5); for Le Net-5, δ is set to 5%, and σ is set to 5 in the first 10 pruning loops and 0.5 since then. The reference model is trained from scratch for 10,000 iterations using SGD which mainly follows the experimental settings in [Zeiler and Fergus, 2014] except that the batch size is 256 here; and the same settings of SGD are also used during retraining. (K, pruning loops, pop N, Tmax, δ) is set to (250, 40 , 8, 200, 8%), and σ is set to 5 in the first 28 pruning loops and 0.5 since then.