Dynamic Channel Pruning: Feature Boosting and Suppression

Authors: Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, Cheng-zhong Xu

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran extensive experiments on CIFAR-10 (Krizhevsky et al., 2014) and the Image Net ILSVRC2012 (Deng et al., 2009), two popular image classification datasets. ... Empirical results show that under the same speed-ups, FBS can produce models with validation accuracies surpassing all other channel pruning and dynamic conditional execution methods examined in the paper.
Researcher Affiliation Academia 1 Shenzhen Institutes of Advanced Technology, Shenzhen, China 2,3,4 University of Cambridge, Cambridge, UK 5 University of Macau, Macau, China 1 xt.gao@siat.ac.cn, 2 yaz21@cam.ac.uk
Pseudocode No The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Finally, the implementation of FBS and the optimized networks are fully open source and released to the public1. 1https://github.com/deep-fry/mayo
Open Datasets Yes We ran extensive experiments on CIFAR-10 (Krizhevsky et al., 2014) and the Image Net ILSVRC2012 (Deng et al., 2009), two popular image classification datasets.
Dataset Splits Yes Empirical results show that under the same speed-ups, FBS can produce models with validation accuracies surpassing all other channel pruning and dynamic conditional execution methods examined in the paper.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run its experiments.
Software Dependencies No The paper mentions using 'conventional stochastic gradient descent' for training but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes We trained M-Cifar Net (see Appendix A) with a 0.01 learning rate and a 256 batch size. We reduced the learning rate by a factor of 10 for every 100 epochs. ... ILSVRC2012 classifiers, i.e. Res Net-18 and VGG-16, were trained with a procedure similar to Appendix A. The difference was that they were trained for a maximum of 35 epochs, the learning rate was decayed for every 20 epochs, and NS models were all pruned at 15 epochs.