Data-Efficient Structured Pruning via Submodular Optimization

Authors: Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
Researcher Affiliation Collaboration Marwa El Halabi Samsung SAIT AI Lab, Montreal Suraj Srinivas Harvard University Simon Lacoste-Julien Mila, Université de Montreal Samsung SAIT AI Lab, Montreal
Pseudocode Yes Algorithm 1 GREEDY
Open Source Code Yes The code for reproducing all experiments is available at https://github.com/marwash25/subpruning.
Open Datasets Yes Le Net model [Le Cun et al., 1989] on the MNIST dataset [Lecun et al., 1998], and on the Res Net56 [He et al., 2016] and the VGG11 [Simonyan and Zisserman, 2015] models on the CIFAR-10 dataset [Krizhevsky et al., 2009].
Dataset Splits Yes We report top-1 accuracy results evaluated on the validation set, as we vary the compression ratio (original size / pruned size). Unless otherwise specified, we use the per-layer budget selection method described in Section 5.2 for all the layerwise pruning methods... We set aside a subset of the training set to use as a verification set.
Hardware Specification Yes All experiments are run on an internal cluster with NVIDIA V100 or A100 GPUs.
Software Dependencies No The code is written in PyTorch [Paszke et al., 2017]. No specific version number for PyTorch or other software dependencies is provided.
Experiment Setup Yes To compute the gradients and activations used for pruning in LAYERSAMPLING, ACTGRAD, LAYERACTGRAD, and our method s variants, we use four batches of 128 training images, i.e., n = 512, which corresponds to 1% of the training data in MNIST and CIFAR10. All models are trained for 100 epochs using SGD with momentum 0.9, weight decay 5e-4, and initial learning rate 0.1, which is reduced by a factor of 10 at epochs 50 and 75.