CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
Authors: Denis Kuznedelev, Eldar Kurtić, Elias Frantar, Dan Alistarh
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach via extensive experiments on several modern vision models such as Vision Transformers (Vi T), modern CNNs, and Vi T-CNN hybrids, showing for the first time that these can be pruned to high sparsity levels (e.g. 75%) with low impact on accuracy ( 1% relative drop). |
| Researcher Affiliation | Collaboration | Denis Kuznedelev Skoltech & Yandex Denis.Kuznedelev@skoltech.ru Eldar Kurtic IST Austria eldar.kurtic@ist.ac.at Elias Frantar IST Austria elias.frantar@ist.ac.at Dan Alistarh IST Austria & Neural Magic dan.alistarh@ist.ac.at |
| Pseudocode | Yes | A CAP algorithm description The section below illustrates in the CAP pruning algorithm step-by-step... Algorithm 1 CAP pruning algorithm; B Fast CAP: A faster version of the algorithm ... The full pseudocode for Fast CAP is provided in Algorithm 2. |
| Open Source Code | Yes | 1The code is available at https://github.com/IST-DASLab/CAP |
| Open Datasets | Yes | For instance, experiments on the standard Image Net-1K benchmark [37] show for the first time that Vi T models can attain high sparsity levels without significant accuracy impact |
| Dataset Splits | Yes | We consider the Image Net [37] image classification benchmark... We aim to obtain accurate sparse checkpoints for 50%, 60%, 75%, 80%, and 90% sparsity... Table 6: Hyperparameters of the schedules used in gradual pruning. Model Prune freq LR sched {fdecay, ηmax, ηmin} Augm Batch size Epochs Dei T-Tiny 20 {cyclic_linear, 5 10 4, 1 10 5} light1 1024 300 Dei T-Small 20 {cyclic_linear, 5 10 4, 1 10 5} deit 1024 300 |
| Hardware Specification | Yes | Specifically, we executed the models from Table 2 using 4 cores of an Intel(R) Xeon(R) Gold 6238R CPU, at batch size 64... The pruning step for the whole model takes 400 seconds on a single A100 GPU... All measurements were done on a single RTX A6000 GPU with 48GB of memory... The inference was executed on Nvidia T4 GPU |
| Software Dependencies | No | The paper mentions 'PyTorch image models' [47] as a library used, but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | To achieve best performance, modern training procedures involve longer training schedules together with a careful choice of hyperparameters... Specifically, we propose to use a cyclic linear schedule: η(t) = ηmax (ηmax ηmin)t%T ... We provide detailed parameter values and ablations in Appendix C... Table 6: Hyperparameters of the schedules used in gradual pruning. Model Prune freq LR sched {fdecay, ηmax, ηmin} Augm Batch size Epochs Dei T-Tiny 20 {cyclic_linear, 5 10 4, 1 10 5} light1 1024 300 |