reproducibilityindex.ai

Where to Prune: Using LSTM to Guide End-to-end Pruning

Authors: Jing Zhong, Guiguang Ding, Yuchen Guo, Jungong Han, Bin Wang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our approach is capable of reducing 70.1% FLOPs for VGG and 47.5% for Resnet-56 with comparable accuracy. Also, the learning results seem to reveal the sensitivity of each network layer.
Researcher Affiliation	Academia	Jing Zhong , Guiguang Ding , Yuchen Guo , Jungong Han , Bin Wang , Beijing National Laboratory for Information Science and Technology (BNList) School of Software, Tsinghua University, Beijing 100084, China School of Computing & Communications, Lancaster University, UK {zhongjingheart,yuchen.w.guo}@gmail.com, {dinggg,wangbin}@tsinghua.edu.cn, jungong.han@morthumbria.ac.uk
Pseudocode	No	The paper describes the method conceptually and through a framework diagram (Figure 2), but does not provide a formal pseudocode block or algorithm.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets	Yes	We empirically apply our method on three benchmark datasets: CIFAR-10, CIFAR-100, and MNIST. Two CIFAR datasets [Krizhevsky and Hinton, 2009] contain 50000 training images and 10000 test images. The MNIST contains 60000 and 10000 images for training and testing respectively.
Dataset Splits	Yes	In all the datasets, 10% images are split from training set as validation set used for evaluating new network structures and calculating their reward signals to LSTM.
Hardware Specification	Yes	All the experiments are implemented with Py Torch on one NVIDIA TITAN X GPU.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with their versions.
Experiment Setup	Yes	In the experiments, we train the initial models from scratch and calculate their accuracies as baselines. The pruning rate Rprune is set to 0.2. During ﬁne-tuning, the teacher instructs the student to train for 30 epochs on CIFAR datasets and 10 epochs on MNIST dataset. When LSTM no longer produces better network structures within 10 epochs, the algorithm is terminated. We retrain the network with the best reward for 250 epochs on CIFAR and 100 epochs on MNIST. Both training and validation datasets are used for retraining the network with ﬁxed learning rate 0.001 to get its ﬁnal accuracy. In every epoch for training LSTM, 5 parent network architectures with biggest rewards are picked and fed into LSTM successively. Their rewards are taken as baselines b in policy gradient method. If there are no more than 5 structures in local, all the local networks are taken as inputs. In the ﬁrst epoch, the input is the pre-trained network.