Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Authors: Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance. We experiment with the standard image classification benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classification dataset (He et al., 2016). We describe an image segmentation experiment whose neural network model is composed from an inception-like network branch and a densenet network branch. |
| Researcher Affiliation | Collaboration | Jianbo Ye College of Information Sciences and Technology The Pennsylvania State University jxy198@ist.psu.edu Xin Lu, Zhe Lin Adobe Research {xinl,zlin}@adobe.com James Z. Wang College of Information Sciences and Technology The Pennsylvania State University jwang@ist.psu.edu |
| Pseudocode | Yes | 4.2 THE ALGORITHM We describe our algorithm below. The following method applies to both training from scratch or re-training from a pre-trained model. Given a training loss l, a convolutional neural net N, and hyper-parameters ρ, α, µ0, our method proceeds as follows: 1. Computation of sparse penalty for each layer. ... 2. γ-W rescaling trick. ... 3. End-to-End training with ISTA on γ. ... 4. Post-process to remove constant channels. ... 5. γ-W rescaling trick. ... 6. Fine-tune e N using regular stochastic gradient learning. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We experiment with the standard image classification benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classification dataset (He et al., 2016). This model was originally trained on multiple datasets. COCO-person (Lin et al., 2014) |
| Dataset Splits | No | The paper mentions 'test accuracy' and 'test datasets' but does not explicitly describe the training, validation, and test splits (e.g., percentages or counts) for the datasets used. |
| Hardware Specification | No | The paper mentions 'across 4 GPUs' in relation to batch size, but does not specify the model or type of GPUs used for the experiments. |
| Software Dependencies | No | The paper mentions 'pre-trained Tensor Flow Res Net-101 model' but does not specify the version of TensorFlow or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use a fixed learning rate µt = 0.01, scaling parameter α = 1.0, and set batch size to 125. We use a warm-up ρ = 0.001 for 30k steps and then train with ρ = 0.005. We set the scaling parameter α = 0.01, the initial learning rate µt = 0.001, the sparsity penalty ρ = 0.1, and the batch size = 128 (across 4 GPUs). The learning rate is decayed every four epochs with rate 0.86. We set α = 0.01, ρ = 0.5, µt = 2 10−5, and batch size = 24. |