Operation-Aware Soft Channel Pruning using Differentiable Masks
Authors: Minsoo Kang, Bohyung Han
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks given the same amount of resources when compared with the state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Computer Vision Laboratory, Department of Electrical and Computer Engineering & ASRI, Seoul National University, Korea. Correspondence to: Bohyung Han <bhhan@snu.ac.kr>. |
| Pseudocode | No | The paper describes its method using mathematical equations and text, but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We employ CIFAR-10/100 and ILSVRC-12 in our experiment, which are the datasets widely accepted for evaluation of model compression techniques. The ILSVRC-12 dataset (Russakovsky et al., 2015) contains 1,281,167 training images in color and 50,000 color images for validation in 1,000 classes. |
| Dataset Splits | Yes | The CIFAR-10/100 datasets consist of 50K and 10K color image splits for training and testing, where the size of each image is 32x32. On the other hand, the ILSVRC-12 dataset (Russakovsky et al., 2015) contains 1,281,167 training images in color and 50,000 color images for validation in 1,000 classes. |
| Hardware Specification | Yes | To observe the realistic and practical speed-up of the proposed method, we measure the wall clock inference time for the unpruned and pruned models on the NVIDIA TITAN Xp with a batch size of 64. |
| Software Dependencies | No | The proposed method is implemented using Tensor Flow library (Abadi et al., 2015). However, no specific version number for TensorFlow or any other software dependency is provided. |
| Experiment Setup | Yes | We train the network using SGD with Nesterov momentum (Sutskever et al., 2013) 0.9, weight decay parameter 0.0001, and initial learning rate 0.1. The setting for the experiment on the CIFAR datasets follows the one used in (Liu et al., 2017), where the batch size is set to 64, and the learning rate is reduced by the factor of 10 after the 80th and 120th epochs. For the ILSVRC12 dataset, the network is learned for 100 epochs with 4 GPUs, where the total batch size is 256 and the learning rate is dropped by a factor of 10 at the 30th, 60th, and 90th epochs. Fine-tuning the pruned network is based on the same setting except for the initial learning rate of 0.01. We set the temperature parameter τ for Gumbel-Softmax (Jang et al., 2017) to 0.5 and the threshold δ for CDF to 0.05 for all pruned models. All networks are trained from scratch. |