Discrimination-aware Channel Pruning for Deep Neural Networks
Authors: Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned Res Net-50 with 30% reduction of channels outperforms the baseline model by 0.39% in top-1 accuracy. |
| Researcher Affiliation | Collaboration | Zhuangwei Zhuang1 , Mingkui Tan1 , Bohan Zhuang2 , Jing Liu1 , Yong Guo1, Qingyao Wu1, Junzhou Huang3,4, Jinhui Zhu1 1South China University of Technology, 2The University of Adelaide, 3University of Texas at Arlington, 4Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1 Discrimination-aware channel pruning (DCP). Algorithm 2 Greedy algorithm for channel selection. |
| Open Source Code | Yes | The source code of our method can be found at https://github.com/SCUT-AILab/DCP. |
| Open Datasets | Yes | We evaluate the performance of various methods on three datasets, including CIFAR-10 [20], ILSVRC-12 [4], and LFW [17]. CIFAR-10 consists of 50k training samples and 10k testing images with 10 classes. ILSVRC-12 contains 1.28 million training samples and 50k testing images for 1000 classes. LFW [17] contains 13,233 face images from 5,749 identities. |
| Dataset Splits | No | The paper states the number of training and testing samples for CIFAR-10 and ILSVRC-12 (e.g., 'CIFAR-10 consists of 50k training samples and 10k testing images'). For LFW, it mentions 'ten-fold validation accuracy'. However, it does not explicitly provide details about a specific validation dataset split (e.g., percentages, sample counts, or explicit mention of a validation set beyond what might be implied by 'ten-fold cross validation'). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states 'We implement the proposed method on Py Torch [32]', but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use SGD with nesterov [30] for the optimization. The momentum and weight decay are set to 0.9 and 0.0001, respectively. We set λ to 1.0 in our experiments by default. On CIFAR-10, we fine-tune 400 epochs using a mini-batch size of 128. The learning rate is initialized to 0.1 and divided by 10 at epoch 160 and 240. On ILSVRC-12, we fine-tune the network for 60 epochs with a mini-batch size of 256. The learning rate is started at 0.01 and divided by 10 at epoch 36, 48 and 54, respectively. |