WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Authors: Sidak Pal Singh, Dan Alistarh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Wood Fisher significantly outperforms popular state-of-the-art methods for one-shot pruning. Further, even when iterative, gradual pruning is allowed, our method results in a gain in test accuracy over the state-of-the-art approaches, for standard image classification datasets such as Image Net ILSVRC. Wood Fisher belongs to the first class of methods, but can be used together with both iterative and dynamic methods. 5 Experimental Results We now apply Wood Fisher for compressing CNNs on image classification tasks.
Researcher Affiliation Collaboration Sidak Pal Singh ETH Zurich, Switzerland contact@sidakpal.com Dan Alistarh IST Austria & Neural Magic, Inc. dan.alistarh@ist.ac.at
Pseudocode No The paper describes methods using mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code is available at the following link, https://github.com/IST-DASLab/Wood Fisher.
Open Datasets Yes standard image classification datasets such as Image Net ILSVRC. We consider the Hessian (H) and empirical Fisher ( b F) matrices for neural networks trained on standard datasets like CIFAR10 and MNIST.
Dataset Splits No The paper mentions using standard datasets like CIFAR10, MNIST, and ImageNet and reports 'test accuracy' but does not explicitly provide specific percentages or sample counts for training, validation, and test splits within the paper, nor does it explicitly mention the use of a distinct 'validation' set with details for reproducibility.
Hardware Specification Yes all our IMAGENET experiments are run for 100 epochs on 4 NVIDIA V100 GPUs (i.e., 2.5 days for RESNET-50 and 1 day for MOBILENETV1).
Software Dependencies No The paper does not explicitly list specific software dependencies with their version numbers required to reproduce the experiments.
Experiment Setup Yes all our IMAGENET experiments are run for 100 epochs on 4 NVIDIA V100 GPUs... In terms of the pruning schedule, we follow the polynomial scheme of [19]... Typically, we use 80 or 240 such averaged gradients over a mini-batch of size 100... We used 16,000 samples to estimate the diagonal Fisher, whereas Wood Fisher performs well even with 1,000 samples... The detailed results are present in Appendix S5.1, where we also show one-shot pruning results of MOBILENETV1 on IMAGENET, as well as ablation for the effect of chunk-size, dampening λ, # of samples used for Fisher computations.