reproducibilityindex.ai

UPSCALE: Unconstrained Channel Pruning

Authors: Alvin Wan, Hanxiang Hao, Kaushik Patnaik, Yueyang Xu, Omer Hadad, David Güera, Zhile Ren, Qi Shan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present extensive experimentation to show that unconstrained pruning can attain significantly higher accuracy than constrained pruning, especially for modern, larger models and for those with complex topologies. For these unconstrained pruned models, we then show that UPSCALE outperforms a baseline export in inference-time latency
Researcher Affiliation	Industry	Apple, Cupertino, USA. Correspondence to: Alvin Wan <alvinwan@apple.com>, Qi Shan <qshan@apple.com>.
Pseudocode	Yes	Algorithm 1 UPSCALE
Open Source Code	Yes	1https://github.com/apple/ml-upscale
Open Datasets	Yes	All accuracies are reported on the Image Net ILSVRC 2015 (Russakovsky et al., 2015) validation dataset.
Dataset Splits	Yes	All accuracies are reported on the Image Net ILSVRC 2015 (Russakovsky et al., 2015) validation dataset.
Hardware Specification	Yes	We use a single V100 GPU with 32 GB RAM.
Software Dependencies	No	To export models for timing, we run an existing pruning strategy on the provided model, export using UPSCALE, then use Py Torch’s jit trace to produce a Python-less executable. This traced model is then benchmarked using Py Torch’s built-in profiling utility, including CUDA activities and tracking tensor memory allocation. The paper mentions PyTorch and CUDA but does not provide specific version numbers.
Experiment Setup	Yes	We sparsify parameters at intervals of 2.5% from 0% to 100% and test 5 pruning strategies across 15+ architectures. All our latency measurements are the aggregate of 100 runs, with both mean and standard deviations reported. We channel prune Dense Net121 at 10%, 20%, 30%, 40%, 50% parameter sparsity using the LAMP heuristic... We then fine-tune all 10 models for 5 epochs each.