Dynamic Structure Pruning for Compressing CNNs

Authors: Jun-Hyung Park, Yeachan Kim, Junho Kim, Joon-Young Choi, SangKeun Lee

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that dynamic structure pruning achieves state-of-the-art pruning performance and better realistic acceleration on a GPU compared with channel pruning. In particular, it reduces the FLOPs of Res Net50 by 71.85% without accuracy degradation on the Image Net dataset. Our code is available at https://github.com/irishev/DSP.
Researcher Affiliation Academia Jun-Hyung Park1, Yeachan Kim2, Junho Kim2, Joon-Young Choi2, Sang Keun Lee1,2 1Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea 2Department of Artificial Intelligence, Korea University, Seoul, Republic of Korea {irish07, yeachan, monocrat, johnjames, yalphy}@korea.ac.kr
Pseudocode Yes Algorithm 1: Dynamic Structure Pruning
Open Source Code Yes Our code is available at https://github.com/irishev/DSP.
Open Datasets Yes We validate the effectiveness of dynamic structure pruning through extensive experiments with diverse network architectures on the CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Image Net (Deng et al. 2009) datasets.
Dataset Splits Yes We report test/validation accuracy of pruned models (P. Acc.) for CIFAR-10/Image Net, accuracy difference between the original and pruned models ( Acc.), and pruning rates of parameters (Params ) and FLOPs (FLOPs ).
Hardware Specification Yes The experiments are implemented using Pytorch and conducted on a Linux machine with an Intel i9-10980XE CPU and 4 NVIDIA RTX A5000 GPUs.
Software Dependencies No The paper mentions implementing experiments using Pytorch, but does not provide specific version numbers for Pytorch or any other software libraries or dependencies.
Experiment Setup Yes We search the hyperparameters for dynamic structure pruning based on the empirical analysis, i.e., the value of τ {0.125, 0.25, 0.5, 1}, λ {5e-4, 1e-3, 2e-3, 3e-3} for CIFAR-10 and λ {1e-4, 2e-4, 3e-4, 5e-4} for Image Net. We use Adam optimizer with a learning rate of 0.001 and momentum of (0.9, 0.999) to train group parameters. During differentiable group learning, we set the initial learning rate to 0.05, and train models for 120 and 60 epochs in the CIFAR-10 and Image Net experiments, respectively. Then, pruned models are fine-tuned for 80 epochs with initial learning rates of 0.015 and 0.05 for five and three iterations in the CIFAR-10 and Image Net experiments, respectively. We use a cosine learning rate scheduling with weight decay of 1e-3 and 3e-5 for the CIFAR-10 and Image Net experiments, respectively, to yield the best results fitted to our additional regularization.