Channel Pruning via Automatic Structure Search

Authors: Mingbao Lin, Rongrong Ji, Yuxin Zhang, Baochang Zhang, Yongjian Wu, Yonghong Tian

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We conduct compression for representative networks, including VGGNet, Goog Le Net and Res Net-56/110 on CIFAR-10 [Krizhevsky et al., 2009], and Res Net-18/34/50/101/152 on ILSVRC-2012 [Russakovsky et al., 2015].
Researcher Affiliation Collaboration Mingbao Lin1 , Rongrong Ji1 , Yuxin Zhang1 , Baochang Zhang2 , Yongjian Wu3 , Yonghong Tian4 1Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, China 2School of Automation Science and Electrical Engineering, Beihang University, China 3Tencent Youtu Lab, Tencent Technology (Shanghai) Co., Ltd, China 4School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Pseudocode Yes Algorithm 1: ABCPruner
Open Source Code Yes The source codes can be available at https: //github.com/lmbxmu/ABCPruner.
Open Datasets Yes We conduct compression for representative networks, including VGGNet, Goog Le Net and Res Net-56/110 on CIFAR-10 [Krizhevsky et al., 2009], and Res Net-18/34/50/101/152 on ILSVRC-2012 [Russakovsky et al., 2015].
Dataset Splits No The paper refers to 'Ttrain' and 'Ttest' for training and evaluation during the search and fine-tuning. It does not explicitly mention a separate 'validation' dataset split or specify its proportions.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or other detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions general software components like Stochastic Gradient Descent (SGD) but does not provide specific version numbers for any libraries, frameworks, or other software dependencies.
Experiment Setup Yes We use the Stochastic Gradient Descent algorithm (SGD) for fine-tuning with momentum 0.9 and the batch size is set to 256. On CIFAR-10, the weight decay is set to 5e-3 and we fine-tune the network for 150 epochs with a learning rate of 0.01, which is then divided by 10 every 50 training epochs. On ILSVRC-2012, the weight decay is set to 1e-4 and 90 epochs are given for fine-tuning. The learning rate is set as 0.1, and divided by 10 every 30 epochs. ... For each structure, we train the pruned model N for two epochs to obtain its fitness. We empirically set T =2, n=3, and M=2 in the Alg. 1.