reproducibilityindex.ai

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Authors: Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance. We experiment with the standard image classiﬁcation benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classiﬁcation dataset (He et al., 2016). We describe an image segmentation experiment whose neural network model is composed from an inception-like network branch and a densenet network branch.
Researcher Affiliation	Collaboration	Jianbo Ye College of Information Sciences and Technology The Pennsylvania State University jxy198@ist.psu.edu Xin Lu, Zhe Lin Adobe Research {xinl,zlin}@adobe.com James Z. Wang College of Information Sciences and Technology The Pennsylvania State University jwang@ist.psu.edu
Pseudocode	Yes	4.2 THE ALGORITHM We describe our algorithm below. The following method applies to both training from scratch or re-training from a pre-trained model. Given a training loss l, a convolutional neural net N, and hyper-parameters ρ, α, µ0, our method proceeds as follows: 1. Computation of sparse penalty for each layer. ... 2. γ-W rescaling trick. ... 3. End-to-End training with ISTA on γ. ... 4. Post-process to remove constant channels. ... 5. γ-W rescaling trick. ... 6. Fine-tune e N using regular stochastic gradient learning.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	We experiment with the standard image classiﬁcation benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classiﬁcation dataset (He et al., 2016). This model was originally trained on multiple datasets. COCO-person (Lin et al., 2014)
Dataset Splits	No	The paper mentions 'test accuracy' and 'test datasets' but does not explicitly describe the training, validation, and test splits (e.g., percentages or counts) for the datasets used.
Hardware Specification	No	The paper mentions 'across 4 GPUs' in relation to batch size, but does not specify the model or type of GPUs used for the experiments.
Software Dependencies	No	The paper mentions 'pre-trained Tensor Flow Res Net-101 model' but does not specify the version of TensorFlow or any other software dependencies with version numbers.
Experiment Setup	Yes	We use a ﬁxed learning rate µt = 0.01, scaling parameter α = 1.0, and set batch size to 125. We use a warm-up ρ = 0.001 for 30k steps and then train with ρ = 0.005. We set the scaling parameter α = 0.01, the initial learning rate µt = 0.001, the sparsity penalty ρ = 0.1, and the batch size = 128 (across 4 GPUs). The learning rate is decayed every four epochs with rate 0.86. We set α = 0.01, ρ = 0.5, µt = 2 10−5, and batch size = 24.