Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Data-Efficient Structured Pruning via Submodular Optimization

Authors: Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
Researcher Affiliation Collaboration Marwa El Halabi Samsung SAIT AI Lab, Montreal Suraj Srinivas Harvard University Simon Lacoste-Julien Mila, Université de Montreal Samsung SAIT AI Lab, Montreal
Pseudocode Yes Algorithm 1 GREEDY
Open Source Code Yes The code for reproducing all experiments is available at https://github.com/marwash25/subpruning.
Open Datasets Yes Le Net model [Le Cun et al., 1989] on the MNIST dataset [Lecun et al., 1998], and on the Res Net56 [He et al., 2016] and the VGG11 [Simonyan and Zisserman, 2015] models on the CIFAR-10 dataset [Krizhevsky et al., 2009].
Dataset Splits Yes We report top-1 accuracy results evaluated on the validation set, as we vary the compression ratio (original size / pruned size). Unless otherwise specified, we use the per-layer budget selection method described in Section 5.2 for all the layerwise pruning methods... We set aside a subset of the training set to use as a verification set.
Hardware Specification Yes All experiments are run on an internal cluster with NVIDIA V100 or A100 GPUs.
Software Dependencies No The code is written in PyTorch [Paszke et al., 2017]. No specific version number for PyTorch or other software dependencies is provided.
Experiment Setup Yes To compute the gradients and activations used for pruning in LAYERSAMPLING, ACTGRAD, LAYERACTGRAD, and our method s variants, we use four batches of 128 training images, i.e., n = 512, which corresponds to 1% of the training data in MNIST and CIFAR10. All models are trained for 100 epochs using SGD with momentum 0.9, weight decay 5e-4, and initial learning rate 0.1, which is reduced by a factor of 10 at epochs 50 and 75.