Runtime Neural Pruning

Authors: Ji Lin, Yongming Rao, Jiwen Lu, Jie Zhou

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments on three different datasets including CIFAR-10, CIFAR-100 [22] and ILSVRC2012 [36] to show the effectiveness of our method. Experimental results on the CIFAR [22] and Image Net [36] datasets show that our framework successfully learns to allocate different amount of computational resources for different input images, and achieves much better performance at the same cost.
Researcher Affiliation Academia Ji Lin Department of Automation Tsinghua University lin-j14@mails.tsinghua.edu.cn Yongming Rao Department of Automation Tsinghua University raoyongming95@gmail.com Jiwen Lu Department of Automation Tsinghua University lujiwen@tsinghua.edu.cn Jie Zhou Department of Automation Tsinghua University jzhou@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Runtime neural pruning for solving optimization problem (1): Input: training set with labels {X} Output: backbone CNN C, decision network D 1: initialize: train C in normal way or initialize C with pre-trained model 2: for i 1, 2, ..., M do 3: // train decision network 4: for j 1, 2, ..., N1 do 5: Sample random minibatch from {X} 6: Forward and sample ϵ-greedy actions {st, at} 7: Compute corresponding rewards {rt} 8: Backward Q values for each stage and generate θLre 9: Update θ using θLre 10: end for 11: // fine-tune backbone CNN 12: for k 1, 2, ..., N2 do 13: Sample random minibatch from {X} 14: Forward and calculate Lcls after runtime pruning by D 15: Backward and generate CLcls 16: Update C using CLcls 17: end for 18: end for 19: return C and D
Open Source Code No The paper mentions using 'the modified Caffe toolbox [20]' for implementation but does not provide a specific link or statement about releasing their own source code for the RNP framework.
Open Datasets Yes We conducted experiments on three different datasets including CIFAR-10, CIFAR-100 [22] and ILSVRC2012 [36] to show the effectiveness of our method.
Dataset Splits Yes We evaluated the top-5 error using single-view testing on ILSVRC2012-val set and trained RNP model using ILSVRC2012-train set.
Hardware Specification Yes Inference time were measured on a Titan X (Pascal) GPU with batch size 64.
Software Dependencies No The paper states 'All our experiments were implemented using the modified Caffe toolbox [20]' but does not provide a specific version number for Caffe or any other software dependencies.
Experiment Setup Yes The initialization was trained using SGD, with an initial learning rate 0.01, decay by a factor of 10 after 120, 160 epochs, with totally 200 epochs in total. The other training progress was conducted using RMSprop [42] with the learning rate of 1e-6. For the ϵ-greedy strategy, the hyper-parameter ϵ was annealed linearly from 1.0 to 0.1 in the beginning and fixed at 0.1 thereafter. For most experiments, we set the number of convolutional group to k = 4... During the training, we set the penalty for extra feature map calculation as p = 0.1... The scale α factor was set such that the average αLcls is approximately 1