Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Width & Depth Pruning for Vision Transformers
Authors: Fang Yu, Kun Huang, Meng Wang, Yuan Cheng, Wei Chu, Li Cui3143-3151
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of mainstream vision transformers such as Dei T and Swin Transformer with a minor accuracy drop. In particular, on ILSVRC-12, we achieve over 22% pruning ratio of FLOPs by compressing Dei T-Base, even with an increase of 0.14% Top-1 accuracy. |
| Researcher Affiliation | Collaboration | Fang Yu1,2, Kun Huang3, Meng Wang3, Yuan Cheng3*, Wei Chu3, Li Cui1 1Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Science 3Ant Financial Services Group EMAIL, EMAIL |
| Pseudocode | Yes | The detailed procedure is presented in Algorithm 1. |
| Open Source Code | No | The paper does not contain an explicit statement about making the source code for their methodology available, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Datasets CIFAR-10 contains 50k training images and 10k validating images, which are categorized into 10 classes for image classification. Compared with CIFAR-10, ILSVRC-12 is a larger scale image classification dataset, which comprises 1.28 million images from 1k categories for training and 50k images for validation. |
| Dataset Splits | Yes | Datasets CIFAR-10 contains 50k training images and 10k validating images, which are categorized into 10 classes for image classification. Compared with CIFAR-10, ILSVRC-12 is a larger scale image classification dataset, which comprises 1.28 million images from 1k categories for training and 50k images for validation. |
| Hardware Specification | Yes | The GPU throughout is obtained by measuring the forward time on a NVIDIA RTX 3090 GPU with a batchsize of 1024, and the latency on CPU is measured on AMD EPYC 7502 32-Core CPU with a batchsize of 1. |
| Software Dependencies | No | The paper mentions software like "Adam W optimizer", "Tensor RT", and "ONNX", but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | The initial learning rate is 0.0005. We use Adam W optimizer with a momentum of 0.9 for optimization. We set the weight decay to 0.05. [...] The learning rates of saliency scores and threshold parameters are set by 0.025 initially, and they are finetuned with Adam W with cosine learning rate decay strategy. |