Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Authors: Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, Pradeep Dubey

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 3 demonstrates the effectiveness of these developments on Alex Net and Goog Le Net on a variety of platforms. and Our sparse CNN design is evaluated on three platforms shown in Table 1.
Researcher Affiliation Collaboration 1Intel Labs, 2Department of Electrical and Computing Engineering, University of Pittsburgh 3Department of Electrical and Computer Engineering, Duke University
Pseudocode Yes Figure 2: Sparse convolution pseudo code.
Open Source Code Yes Our sparse CNN is implemented as an extension of Caffe deep learning framework (Jia et al., 2014) and is at https://github.com/Intel Labs/Skim Caffe.
Open Datasets Yes We train with the Image Net ILSVRC-2012 dataset (Deng et al., 2009)
Dataset Splits No The paper mentions training with the ImageNet ILSVRC-2012 dataset and evaluating on the ImageNet test set, but it does not explicitly provide details about the specific training/validation/test splits used for its experiments (e.g., percentages or sample counts for validation).
Hardware Specification Yes Our sparse CNN design is evaluated on three platforms shown in Table 1. Intel C2750 (Atom) ... Xeon E5-2697 v4 (BDW) ... Xeon Phi 7250 (KNL)
Software Dependencies Yes We use Intel compiler version 17.0.0 and use all cores available. The SGEMM performance and achievable memory bandwidth listed are measured with Intel MKL version 2017 and STREAM benchmark (Mc Calpin), respectively.
Experiment Setup Yes In general, the pruning step no longer improves after 450K and 900K mini-batch iterations for Alex Net and Goog Le Net, respectively. The re-training step saturates around 150K and 300K mini-batch iterations. To see trade-offs among accuracy, speed, and model size, we try various weight decays ranging from 1e-5 to 1e-3, and, for Alex Net, decay multipliers for fc layer ranging from 1e-2 to 1. We find that the starting learning rate of 1e-3 and weight decay of 5e-5 in general gives a high sparsity with minimal accuracy drop. We reduce the learning rate by 10 for re-training step.