reproducibilityindex.ai

Advancing Model Pruning via Bi-level Optimization

Authors: Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, Sijia Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BIP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 speedup over IMP for the same level of model accuracy and sparsity.
Researcher Affiliation	Collaboration	1Michigan State University, 2 IBM Research, 3Northeastern University, 4University of Texas at Austin, 5University of Minnesota, Twin Cities
Pseudocode	Yes	In Fig. A1, we highlight the algorithmic details on the BIP pipeline. We present more implementation details of BIP below and refer readers to Appendix B for a detailed algorithm description.
Open Source Code	Yes	Codes are available at https://github.com/OPTML-Group/BiP.
Open Datasets	Yes	Following the pruning benchmark in [22], we consider 4 datasets including CIFAR-10 [102], CIFAR-100 [102], Tiny-Image Net [103], Image Net [104], and 5 architecture types including Res Net-20/56/18/50 and VGG-16 [105, 106].
Dataset Splits	Yes	The solid line and shaded area of each pruning method represent the mean and variance of test accuracies over 3 independent trials.
Hardware Specification	Yes	GPU Model(s): NVIDIA A6000
Software Dependencies	Yes	software environment: Python (3.8.12), Pytorch (1.10.0), Torchvision (0.11.1), CUDA (11.3), CUDNN (8.2.1), Torch-scatter (2.0.9), Torch-sparse (0.6.12), Numpy (1.21.5)
Experiment Setup	Yes	Hyperparameter tuning: As described in (θ-step)-(m-step), BIP needs to set two learning rates α and β for lower-level and upper-level optimization, respectively. We choose α = 0.01 and β = 0.1 in all experiments, where we adopt the mask learning rate β from Hydra [9] and set a smaller lower-level learning rate α, as θ is initialized by a pre-trained dense model. We show ablation study on α in Fig. A8(c). BLO also brings in the low-level convexification parameter γ. We set γ = 1.0 in experiments and refer readers to Fig. A8(b) for a sanity check.