PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Authors: Shiyu Li, Edward Hanson, Hai Li, Yiran Chen

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that we can prune 97% parameters and 92% FLOPs on Res Net18 CIFAR10 with no accuracy loss, and achieve 44% reduction in run-time memory consumption and a 53% reduction in inference latency.
Researcher Affiliation Academia Shiyu Li 1 Edward Hanson 1 Hai Li 1 Yiran Chen 1 1Department of Electrical and Computer Engineering, Duke University, Durham NC, United States. Correspondence to: Shiyu Li <shiyu.li@duke.edu>.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is avaliable at: https://github.com/timlee0212/PENNI.
Open Datasets Yes Experiments were held on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets.
Dataset Splits Yes Experiments were held on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets. On CIFAR-10, we chose VGG16 (Simonyan & Zisserman, 2014), Res Net18 and Res Net56 (He et al., 2016) for experimentation. On Image Net, we used Alex Net (Krizhevsky et al., 2012) and Res Net50 for the experiment, incorporating the pretrained models provided by Py Torch (Py Torch, 2019).
Hardware Specification Yes Hardware Settings We used Intel Xeon Gold 6136 to test the inference performance for CPU platform and NVIDIA Titan X for the GPU platform.
Software Dependencies Yes For software, we used Py Torch 1.4 (Paszke et al., 2019) to implement the inference test.
Experiment Setup Yes All pretraining, retraining and fine-tuning procedures implemented Stochastic Gradient Descent (SGD) as the optimizer with 10 4 weight decay, 0.9 momentum, and batch size set to 128. We selected d = 5 for the decomposition stage and retrained for 100 epochs with 0.01 initial learning rate and the same scheduling. Regularization strength was set to γ = 10 4. The interval between training basis and coefficients was set to 5 epochs. The final fine-tuning procedure took 30 epochs with 0.01 initial learning rate and the same scheduling scheme.