Efficient DNN Neuron Pruning by Minimizing Layer-wise Nonlinear Reconstruction Error
Authors: Chunhui Jiang, Guiying Li, Chao Qian, Ke Tang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark DNN models show the superiority of the proposed approach. In this section, the proposed approach is empirically evaluated on three benchmark data sets: MNIST [Le Cun et al., 1998], CIFAR-10 [Krizhevsky and Hinton, 2009] and ILSVRC2012 [Russakovsky et al., 2015]. |
| Researcher Affiliation | Academia | Chunhui Jiang1, Guiying Li1, Chao Qian1, Ke Tang2 1 Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, Hefei 230027, China 2 Shenzhen Key Lab of Computational Intelligence, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China |
| Pseudocode | Yes | Algorithm 1 The Proposed Approach Input: {c W (l), kl : 1 l L}: the pre-trained model and the cardinality constraints on the neurons of each layer. Output: {W (l), m(l) : 1 l L}: weights and neuron masks of the pruned model 1: Initialize: W (l) c W (l), m(l) 1, for 1 l L 2: for l = 1 to L 1 do 3: Extract a 3-layer subnetwork {W (l), W (l+1)} from the pruned model 4: repeat 5: Compute sensitivities of neurons by Eq. (4) 6: Update m(l) by Eq. (5) with m(l) 0 = kl 7: Forward propagation and compute El+1 by Eq. (7) 8: Update W (l+1), W (l) by Eqs. (8) and (9) 9: iter iter + 1 10: until converged or iter = itermax 11: end for 12: Fine-tuning |
| Open Source Code | No | The paper does not provide any explicit statement about open-source code release or a link to a code repository. |
| Open Datasets | Yes | For MNIST, a 4-layer MLP with neurons 784-500-300-10 is used as in [Wen et al., 2016]. For CIFAR-10, we use a VGGNet variant [Li et al., 2016] which has 13 convolutional layers. ILSVRC2012 is a subset of the huge Image Net data set and contains over 1.2 million images. |
| Dataset Splits | Yes | For MNIST and CIFAR-10, the accuracy on the test set is reported. For ILSVRC2012, the top-1 accuracy on center 224 224 crop of the validation set is reported. ... The Alex Net baseline is trained using the standard protocol in Caffe. |
| Hardware Specification | Yes | Experiments are conducted on NVIDIA TITAN X (Pascal) graphics card. The speedup is measured on a single-thread Intel Xeon E5-2683 CPU. |
| Software Dependencies | No | The proposed approach is implemented with the Caffe framework [Jia et al., 2014]. No specific version numbers for software components are provided. |
| Experiment Setup | Yes | The MLP baseline is trained with 18,000 iterations and an initial learning rate of 0.1 which is multiplied by 0.1 after 1/3 and 2/3 fraction of training iterations. ... All the baseline training uses a batchsize of 128, a momentum of 0.9 and a weight decay of 0.0005. ... For single layer pruning, the proposed method uses 1,500, 400 and 3,000 iterations to minimize NRE for MLP, VGGNet and Alex Net, respectively. The scaling factor λ takes 512 for all the three DNN models. ... We fine-tune the pruned MLP, VGGNet, Alex Net with an initial learning rate of 0.1, 0.01, 0.001 and with 12,000, 32,000, 45,000 iterations, respectively. |