VTC-LFC: Vision Transformer Compression with Low-Frequency Components
Authors: Zhenyu Wang, Hao Luo, Pichao WANG, Feng Ding, Fan Wang, Hao Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the proposed method could save 40% 60% of the FLOPs in Vi Ts, thus significantly increasing the throughput on practical devices with less than 1% performance drop on Image Net-1K. |
| Researcher Affiliation | Industry | Zhenyu Wang Alibaba Group daner.wzy@alibaba-inc.com Hao Luo Alibaba Group michuan.lh@alibaba-inc.com Pichao Wang Alibaba Group pichao.wang@alibaba-inc.com Feng Ding Alibaba Group dingfeng.dingfeng@alibaba-inc.com Fan Wang Alibaba Group fan.w@alibaba-inc.com Hao Li Alibaba Group lihao.lh@alibaba-inc.com |
| Pseudocode | Yes | More details are described in Algorithm 1 in Appendix A.1. |
| Open Source Code | Yes | Code will be available at https://github.com/Daner-Wang/VTC-LFC.git. |
| Open Datasets | Yes | In this section, the proposed method is evaluated on the benchmark Image Net (ILSVRC2012) [43], which is a large dataset containing 1.2M training images and 50k validation images of 1000 classes. |
| Dataset Splits | Yes | In this section, the proposed method is evaluated on the benchmark Image Net (ILSVRC2012) [43], which is a large dataset containing 1.2M training images and 50k validation images of 1000 classes. |
| Hardware Specification | Yes | All the experiments are deployed with Pytorch [39] on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify its version number or any other software dependencies with their specific versions. |
| Experiment Setup | Yes | In the pruning procedure, the number of training samples used for evaluating the performance drop in BCP is 5000 (randomly sampling 5 training samples from each category), the number of training samples for calculating LFS is 2000, and the cutoff factors σc and σt are 0.1 and 0.85. For three models, Dei T-Tiny, Dei T-Small, and Dei T-Base, the global allowable drop ε are 9.5, 14, and 14, and the ratio ρ for the allowable drop is 0.56, 0.35, and 0.3 respectively. The base learning rate is set to 0.0001, and most of the other hyper-parameters follow the settings in [9]. We fine-tune the pruned Dei T-Tiny/Dei T-Small/Dei T-Base models for 300/150/150 epochs. More detailed settings and results of different epochs are listed in Appendix A.3. |