CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Authors: Denis Kuznedelev, Eldar Kurtić, Elias Frantar, Dan Alistarh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach via extensive experiments on several modern vision models such as Vision Transformers (Vi T), modern CNNs, and Vi T-CNN hybrids, showing for the first time that these can be pruned to high sparsity levels (e.g. 75%) with low impact on accuracy ( 1% relative drop).
Researcher Affiliation Collaboration Denis Kuznedelev Skoltech & Yandex Denis.Kuznedelev@skoltech.ru Eldar Kurtic IST Austria eldar.kurtic@ist.ac.at Elias Frantar IST Austria elias.frantar@ist.ac.at Dan Alistarh IST Austria & Neural Magic dan.alistarh@ist.ac.at
Pseudocode Yes A CAP algorithm description The section below illustrates in the CAP pruning algorithm step-by-step... Algorithm 1 CAP pruning algorithm; B Fast CAP: A faster version of the algorithm ... The full pseudocode for Fast CAP is provided in Algorithm 2.
Open Source Code Yes 1The code is available at https://github.com/IST-DASLab/CAP
Open Datasets Yes For instance, experiments on the standard Image Net-1K benchmark [37] show for the first time that Vi T models can attain high sparsity levels without significant accuracy impact
Dataset Splits Yes We consider the Image Net [37] image classification benchmark... We aim to obtain accurate sparse checkpoints for 50%, 60%, 75%, 80%, and 90% sparsity... Table 6: Hyperparameters of the schedules used in gradual pruning. Model Prune freq LR sched {fdecay, ηmax, ηmin} Augm Batch size Epochs Dei T-Tiny 20 {cyclic_linear, 5 10 4, 1 10 5} light1 1024 300 Dei T-Small 20 {cyclic_linear, 5 10 4, 1 10 5} deit 1024 300
Hardware Specification Yes Specifically, we executed the models from Table 2 using 4 cores of an Intel(R) Xeon(R) Gold 6238R CPU, at batch size 64... The pruning step for the whole model takes 400 seconds on a single A100 GPU... All measurements were done on a single RTX A6000 GPU with 48GB of memory... The inference was executed on Nvidia T4 GPU
Software Dependencies No The paper mentions 'PyTorch image models' [47] as a library used, but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes To achieve best performance, modern training procedures involve longer training schedules together with a careful choice of hyperparameters... Specifically, we propose to use a cyclic linear schedule: η(t) = ηmax (ηmax ηmin)t%T ... We provide detailed parameter values and ablations in Appendix C... Table 6: Hyperparameters of the schedules used in gradual pruning. Model Prune freq LR sched {fdecay, ηmax, ηmin} Augm Batch size Epochs Dei T-Tiny 20 {cyclic_linear, 5 10 4, 1 10 5} light1 1024 300