Beyond Network Pruning: a Joint Search-and-Training Approach

Authors: Xiaotong Lu, Han Huang, Weisheng Dong, Xin Li, Guangming Shi

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Res Net and VGGNet demonstrate the superior performance of our proposed method on popular datasets including CIFAR10, CIFAR100 and Image Net. and 4 Experimental Results
Researcher Affiliation Academia Xiaotong Lu1 , Han Huang1 , Weisheng Dong 1 , Xin Li2 , Guangming Shi1 1Xidian University 2West Virginia University {xiaotonglu47, hanhuang8264}@gmail.com, wsdong@mail.xidian.edu.cn, xin.li@ieee.org, gmshi@xidian.edu.cn
Pseudocode Yes Algorithm 1: Sampler and Algorithm 2: Search-and-Training Algorithm
Open Source Code No The paper does not provide a direct link to its source code or explicitly state that its code is open-source or publicly available.
Open Datasets Yes Extensive experiments on Res Net and VGGNet demonstrate the superior performance of our proposed method on popular datasets including CIFAR10, CIFAR100 and Image Net.
Dataset Splits Yes For searching and training, we randomly extract 80% of the official training images as the training set Dtrain in Algorithm 2, and the rest as the validation set Dval.
Hardware Specification Yes for the experiments on CIFAR-10 datasets, we use one NVIDIA Titan XP GPUs for training and searching, and NVIDIA 1080Ti for CIFAR-100. and All experiments on the Image Net dataset use 4 NVIDIA Titan XP GPUs with batch size of 256. and we use 4 NVIDIA 2080Ti GPUs to validate our method on the Res Net-50 model.
Software Dependencies No The paper mentions 'Pytorch' but does not specify a version number for it or any other software dependencies.
Experiment Setup Yes For searching and training, we randomly extract 80% of the official training images as the training set Dtrain in Algorithm 2, and the rest as the validation set Dval. In our implementation, we search the compact networks with thresholds in the range [0.6,0.65,0.7,0.75,0.8] for each layer.The hyper-parameter λ is set to 0.1, γ is set to 2.0 and all parameters and weights are initialized by kaiming normal in Pytorch. For different dataset, we apply different settings: 1) On CIFAR dataset, we use SGD with the momentum of 0.9 and the weight decay of 0.00005 as optimizer. At the beginning, we train the target network coarsely for 100 epochs with batch size 128. The learning rate is started from 0.1 and reduced by cosine scheduler. Then we search T = 30 compact network, whose parameters are optimized on Dtrain for M (M = 40 for t 20, 30 for t > 20) epochs with learning rate of 0.05/0.01, corresponding to different M. And the weights are optimized on Dval for N = 5 epochs with fixed learning rate of 0.001. In the fine-tuning stage, we set batch size of 256, learning rate of 0.01 and optimize the selected compact network until convergence. 2) On Image Net datasets, we optimize the parameters via Adam with weight decay of 0.00001 and the weights via the same SGD as CIFAR. For Res Net model, we coarsely train 40 epochs with an initial learning rate of 0.1 and search T = 20 compact networks. M is set to be 10 with learning rate of 0.001 and N is set to be 2. When finetuning, we set the initial learning rate to 0.001 and divided by 10 every 20 epochs.