Data-Efficient Structured Pruning via Submodular Optimization
Authors: Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime. |
| Researcher Affiliation | Collaboration | Marwa El Halabi Samsung SAIT AI Lab, Montreal Suraj Srinivas Harvard University Simon Lacoste-Julien Mila, Université de Montreal Samsung SAIT AI Lab, Montreal |
| Pseudocode | Yes | Algorithm 1 GREEDY |
| Open Source Code | Yes | The code for reproducing all experiments is available at https://github.com/marwash25/subpruning. |
| Open Datasets | Yes | Le Net model [Le Cun et al., 1989] on the MNIST dataset [Lecun et al., 1998], and on the Res Net56 [He et al., 2016] and the VGG11 [Simonyan and Zisserman, 2015] models on the CIFAR-10 dataset [Krizhevsky et al., 2009]. |
| Dataset Splits | Yes | We report top-1 accuracy results evaluated on the validation set, as we vary the compression ratio (original size / pruned size). Unless otherwise specified, we use the per-layer budget selection method described in Section 5.2 for all the layerwise pruning methods... We set aside a subset of the training set to use as a verification set. |
| Hardware Specification | Yes | All experiments are run on an internal cluster with NVIDIA V100 or A100 GPUs. |
| Software Dependencies | No | The code is written in PyTorch [Paszke et al., 2017]. No specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | To compute the gradients and activations used for pruning in LAYERSAMPLING, ACTGRAD, LAYERACTGRAD, and our method s variants, we use four batches of 128 training images, i.e., n = 512, which corresponds to 1% of the training data in MNIST and CIFAR10. All models are trained for 100 epochs using SGD with momentum 0.9, weight decay 5e-4, and initial learning rate 0.1, which is reduced by a factor of 10 at epochs 50 and 75. |