Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Authors: Xin Dong, Shangyu Chen, Sinno Pan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on benchmark datasets to demonstrate the effectiveness of our pruning method compared with several state-of-the-art baseline methods.
Researcher Affiliation Academia Xin Dong Nanyang Technological University, Singapore n1503521a@e.ntu.edu.sg Shangyu Chen Nanyang Technological University, Singapore schen025@e.ntu.edu.sg Sinno Jialin Pan Nanyang Technological University, Singapore sinnopan@ntu.edu.sg
Pseudocode Yes The procedure of our pruning algorithm for a fully-connected layer l is summarized as follows. Step 1: Get layer input yl 1 from a well-trained deep network. Step 2: Calculate the Hessian matrix Hlii, for i = 1, ..., ml, and its pseudo-inverse over the dataset, and get the whole pseudo-inverse of the Hessian matrix. Step 3: Compute optimal parameter change δΘl and the sensitivity Lq for each parameter at layer l. Set tolerable error threshold ϵ. Step 4: Pick up parameters Θl[q] s with the smallest sensitivity scores. Step 5: If p Lq ϵ, prune the parameter Θl[q] s and get new parameter values via ˆΘl = Θl + δΘl, then repeat Step 4; otherwise stop pruning.
Open Source Code Yes Codes of our work are released at: https://github.com/csyhhu/L-OBS.
Open Datasets Yes The deep architectures used for experiments include: Le Net-300-100 [2] and Le Net-5 [2] on the MNIST dataset, CIFAR-Net2 [24] on the CIFAR-10 dataset, Alex Net [25] and VGG-16 [3] on the Image Net ILSVRC-2012 dataset.
Dataset Splits No The paper mentions using well-known datasets but does not explicitly provide the specific training, validation, or test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification Yes 2.9 hours on 48 Intel Xeon(R) CPU E5-1650 to compute Hessians and 3.1 hours on NVIDIA Tian X GPU to retrain pruned model
Software Dependencies No The paper mentions using 'Tensor Flow [26]' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup No The paper states 'The retraining batch size, crop method and other hyper-parameters are under the same setting as used in LWC,' which defers the specific details to another paper rather than providing them explicitly in the main text. It mentions compression ratios used but not general training hyperparameters like learning rate, epochs, or optimizer settings.