Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Advancing Model Pruning via Bi-level Optimization
Authors: Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, Sijia Liu
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BIP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 speedup over IMP for the same level of model accuracy and sparsity. |
| Researcher Affiliation | Collaboration | 1Michigan State University, 2 IBM Research, 3Northeastern University, 4University of Texas at Austin, 5University of Minnesota, Twin Cities |
| Pseudocode | Yes | In Fig. A1, we highlight the algorithmic details on the BIP pipeline. We present more implementation details of BIP below and refer readers to Appendix B for a detailed algorithm description. |
| Open Source Code | Yes | Codes are available at https://github.com/OPTML-Group/BiP. |
| Open Datasets | Yes | Following the pruning benchmark in [22], we consider 4 datasets including CIFAR-10 [102], CIFAR-100 [102], Tiny-Image Net [103], Image Net [104], and 5 architecture types including Res Net-20/56/18/50 and VGG-16 [105, 106]. |
| Dataset Splits | Yes | The solid line and shaded area of each pruning method represent the mean and variance of test accuracies over 3 independent trials. |
| Hardware Specification | Yes | GPU Model(s): NVIDIA A6000 |
| Software Dependencies | Yes | software environment: Python (3.8.12), Pytorch (1.10.0), Torchvision (0.11.1), CUDA (11.3), CUDNN (8.2.1), Torch-scatter (2.0.9), Torch-sparse (0.6.12), Numpy (1.21.5) |
| Experiment Setup | Yes | Hyperparameter tuning: As described in (θ-step)-(m-step), BIP needs to set two learning rates α and β for lower-level and upper-level optimization, respectively. We choose α = 0.01 and β = 0.1 in all experiments, where we adopt the mask learning rate β from Hydra [9] and set a smaller lower-level learning rate α, as θ is initialized by a pre-trained dense model. We show ablation study on α in Fig. A8(c). BLO also brings in the low-level convexification parameter γ. We set γ = 1.0 in experiments and refer readers to Fig. A8(b) for a sanity check. |