Runtime Neural Pruning
Authors: Ji Lin, Yongming Rao, Jiwen Lu, Jie Zhou
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on three different datasets including CIFAR-10, CIFAR-100 [22] and ILSVRC2012 [36] to show the effectiveness of our method. Experimental results on the CIFAR [22] and Image Net [36] datasets show that our framework successfully learns to allocate different amount of computational resources for different input images, and achieves much better performance at the same cost. |
| Researcher Affiliation | Academia | Ji Lin Department of Automation Tsinghua University lin-j14@mails.tsinghua.edu.cn Yongming Rao Department of Automation Tsinghua University raoyongming95@gmail.com Jiwen Lu Department of Automation Tsinghua University lujiwen@tsinghua.edu.cn Jie Zhou Department of Automation Tsinghua University jzhou@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Runtime neural pruning for solving optimization problem (1): Input: training set with labels {X} Output: backbone CNN C, decision network D 1: initialize: train C in normal way or initialize C with pre-trained model 2: for i 1, 2, ..., M do 3: // train decision network 4: for j 1, 2, ..., N1 do 5: Sample random minibatch from {X} 6: Forward and sample ϵ-greedy actions {st, at} 7: Compute corresponding rewards {rt} 8: Backward Q values for each stage and generate θLre 9: Update θ using θLre 10: end for 11: // fine-tune backbone CNN 12: for k 1, 2, ..., N2 do 13: Sample random minibatch from {X} 14: Forward and calculate Lcls after runtime pruning by D 15: Backward and generate CLcls 16: Update C using CLcls 17: end for 18: end for 19: return C and D |
| Open Source Code | No | The paper mentions using 'the modified Caffe toolbox [20]' for implementation but does not provide a specific link or statement about releasing their own source code for the RNP framework. |
| Open Datasets | Yes | We conducted experiments on three different datasets including CIFAR-10, CIFAR-100 [22] and ILSVRC2012 [36] to show the effectiveness of our method. |
| Dataset Splits | Yes | We evaluated the top-5 error using single-view testing on ILSVRC2012-val set and trained RNP model using ILSVRC2012-train set. |
| Hardware Specification | Yes | Inference time were measured on a Titan X (Pascal) GPU with batch size 64. |
| Software Dependencies | No | The paper states 'All our experiments were implemented using the modified Caffe toolbox [20]' but does not provide a specific version number for Caffe or any other software dependencies. |
| Experiment Setup | Yes | The initialization was trained using SGD, with an initial learning rate 0.01, decay by a factor of 10 after 120, 160 epochs, with totally 200 epochs in total. The other training progress was conducted using RMSprop [42] with the learning rate of 1e-6. For the ϵ-greedy strategy, the hyper-parameter ϵ was annealed linearly from 1.0 to 0.1 in the beginning and fixed at 0.1 thereafter. For most experiments, we set the number of convolutional group to k = 4... During the training, we set the penalty for extra feature map calculation as p = 0.1... The scale α factor was set such that the average αLcls is approximately 1 |