UPSCALE: Unconstrained Channel Pruning

Authors: Alvin Wan, Hanxiang Hao, Kaushik Patnaik, Yueyang Xu, Omer Hadad, David Güera, Zhile Ren, Qi Shan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive experimentation to show that unconstrained pruning can attain significantly higher accuracy than constrained pruning, especially for modern, larger models and for those with complex topologies. For these unconstrained pruned models, we then show that UPSCALE outperforms a baseline export in inference-time latency
Researcher Affiliation Industry Apple, Cupertino, USA. Correspondence to: Alvin Wan <alvinwan@apple.com>, Qi Shan <qshan@apple.com>.
Pseudocode Yes Algorithm 1 UPSCALE
Open Source Code Yes 1https://github.com/apple/ml-upscale
Open Datasets Yes All accuracies are reported on the Image Net ILSVRC 2015 (Russakovsky et al., 2015) validation dataset.
Dataset Splits Yes All accuracies are reported on the Image Net ILSVRC 2015 (Russakovsky et al., 2015) validation dataset.
Hardware Specification Yes We use a single V100 GPU with 32 GB RAM.
Software Dependencies No To export models for timing, we run an existing pruning strategy on the provided model, export using UPSCALE, then use Py Torch’s jit trace to produce a Python-less executable. This traced model is then benchmarked using Py Torch’s built-in profiling utility, including CUDA activities and tracking tensor memory allocation. The paper mentions PyTorch and CUDA but does not provide specific version numbers.
Experiment Setup Yes We sparsify parameters at intervals of 2.5% from 0% to 100% and test 5 pruning strategies across 15+ architectures. All our latency measurements are the aggregate of 100 runs, with both mean and standard deviations reported. We channel prune Dense Net121 at 10%, 20%, 30%, 40%, 50% parameter sparsity using the LAMP heuristic... We then fine-tune all 10 models for 5 epochs each.