Where to Prune: Using LSTM to Guide End-to-end Pruning

Authors: Jing Zhong, Guiguang Ding, Yuchen Guo, Jungong Han, Bin Wang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our approach is capable of reducing 70.1% FLOPs for VGG and 47.5% for Resnet-56 with comparable accuracy. Also, the learning results seem to reveal the sensitivity of each network layer.
Researcher Affiliation Academia Jing Zhong , Guiguang Ding , Yuchen Guo , Jungong Han , Bin Wang , Beijing National Laboratory for Information Science and Technology (BNList) School of Software, Tsinghua University, Beijing 100084, China School of Computing & Communications, Lancaster University, UK {zhongjingheart,yuchen.w.guo}@gmail.com, {dinggg,wangbin}@tsinghua.edu.cn, jungong.han@morthumbria.ac.uk
Pseudocode No The paper describes the method conceptually and through a framework diagram (Figure 2), but does not provide a formal pseudocode block or algorithm.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets Yes We empirically apply our method on three benchmark datasets: CIFAR-10, CIFAR-100, and MNIST. Two CIFAR datasets [Krizhevsky and Hinton, 2009] contain 50000 training images and 10000 test images. The MNIST contains 60000 and 10000 images for training and testing respectively.
Dataset Splits Yes In all the datasets, 10% images are split from training set as validation set used for evaluating new network structures and calculating their reward signals to LSTM.
Hardware Specification Yes All the experiments are implemented with Py Torch on one NVIDIA TITAN X GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes In the experiments, we train the initial models from scratch and calculate their accuracies as baselines. The pruning rate Rprune is set to 0.2. During fine-tuning, the teacher instructs the student to train for 30 epochs on CIFAR datasets and 10 epochs on MNIST dataset. When LSTM no longer produces better network structures within 10 epochs, the algorithm is terminated. We retrain the network with the best reward for 250 epochs on CIFAR and 100 epochs on MNIST. Both training and validation datasets are used for retraining the network with fixed learning rate 0.001 to get its final accuracy. In every epoch for training LSTM, 5 parent network architectures with biggest rewards are picked and fed into LSTM successively. Their rewards are taken as baselines b in policy gradient method. If there are no more than 5 structures in local, all the local networks are taken as inputs. In the first epoch, the input is the pre-trained network.