Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Authors: Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance. We experiment with the standard image classification benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classification dataset (He et al., 2016). We describe an image segmentation experiment whose neural network model is composed from an inception-like network branch and a densenet network branch.
Researcher Affiliation Collaboration Jianbo Ye College of Information Sciences and Technology The Pennsylvania State University jxy198@ist.psu.edu Xin Lu, Zhe Lin Adobe Research {xinl,zlin}@adobe.com James Z. Wang College of Information Sciences and Technology The Pennsylvania State University jwang@ist.psu.edu
Pseudocode Yes 4.2 THE ALGORITHM We describe our algorithm below. The following method applies to both training from scratch or re-training from a pre-trained model. Given a training loss l, a convolutional neural net N, and hyper-parameters ρ, α, µ0, our method proceeds as follows: 1. Computation of sparse penalty for each layer. ... 2. γ-W rescaling trick. ... 3. End-to-End training with ISTA on γ. ... 4. Post-process to remove constant channels. ... 5. γ-W rescaling trick. ... 6. Fine-tune e N using regular stochastic gradient learning.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We experiment with the standard image classification benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classification dataset (He et al., 2016). This model was originally trained on multiple datasets. COCO-person (Lin et al., 2014)
Dataset Splits No The paper mentions 'test accuracy' and 'test datasets' but does not explicitly describe the training, validation, and test splits (e.g., percentages or counts) for the datasets used.
Hardware Specification No The paper mentions 'across 4 GPUs' in relation to batch size, but does not specify the model or type of GPUs used for the experiments.
Software Dependencies No The paper mentions 'pre-trained Tensor Flow Res Net-101 model' but does not specify the version of TensorFlow or any other software dependencies with version numbers.
Experiment Setup Yes We use a fixed learning rate µt = 0.01, scaling parameter α = 1.0, and set batch size to 125. We use a warm-up ρ = 0.001 for 30k steps and then train with ρ = 0.005. We set the scaling parameter α = 0.01, the initial learning rate µt = 0.001, the sparsity penalty ρ = 0.1, and the batch size = 128 (across 4 GPUs). The learning rate is decayed every four epochs with rate 0.86. We set α = 0.01, ρ = 0.5, µt = 2 10−5, and batch size = 24.