reproducibilityindex.ai

PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Authors: Shiyu Li, Edward Hanson, Hai Li, Yiran Chen

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that we can prune 97% parameters and 92% FLOPs on Res Net18 CIFAR10 with no accuracy loss, and achieve 44% reduction in run-time memory consumption and a 53% reduction in inference latency.
Researcher Affiliation	Academia	Shiyu Li 1 Edward Hanson 1 Hai Li 1 Yiran Chen 1 1Department of Electrical and Computer Engineering, Duke University, Durham NC, United States. Correspondence to: Shiyu Li <shiyu.li@duke.edu>.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is avaliable at: https://github.com/timlee0212/PENNI.
Open Datasets	Yes	Experiments were held on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets.
Dataset Splits	Yes	Experiments were held on CIFAR10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) datasets. On CIFAR-10, we chose VGG16 (Simonyan & Zisserman, 2014), Res Net18 and Res Net56 (He et al., 2016) for experimentation. On Image Net, we used Alex Net (Krizhevsky et al., 2012) and Res Net50 for the experiment, incorporating the pretrained models provided by Py Torch (Py Torch, 2019).
Hardware Specification	Yes	Hardware Settings We used Intel Xeon Gold 6136 to test the inference performance for CPU platform and NVIDIA Titan X for the GPU platform.
Software Dependencies	Yes	For software, we used Py Torch 1.4 (Paszke et al., 2019) to implement the inference test.
Experiment Setup	Yes	All pretraining, retraining and ﬁne-tuning procedures implemented Stochastic Gradient Descent (SGD) as the optimizer with 10 4 weight decay, 0.9 momentum, and batch size set to 128. We selected d = 5 for the decomposition stage and retrained for 100 epochs with 0.01 initial learning rate and the same scheduling. Regularization strength was set to γ = 10 4. The interval between training basis and coefﬁcients was set to 5 epochs. The ﬁnal ﬁne-tuning procedure took 30 epochs with 0.01 initial learning rate and the same scheduling scheme.