VTC-LFC: Vision Transformer Compression with Low-Frequency Components

Authors: Zhenyu Wang, Hao Luo, Pichao WANG, Feng Ding, Fan Wang, Hao Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the proposed method could save 40% 60% of the FLOPs in Vi Ts, thus significantly increasing the throughput on practical devices with less than 1% performance drop on Image Net-1K.
Researcher Affiliation Industry Zhenyu Wang Alibaba Group daner.wzy@alibaba-inc.com Hao Luo Alibaba Group michuan.lh@alibaba-inc.com Pichao Wang Alibaba Group pichao.wang@alibaba-inc.com Feng Ding Alibaba Group dingfeng.dingfeng@alibaba-inc.com Fan Wang Alibaba Group fan.w@alibaba-inc.com Hao Li Alibaba Group lihao.lh@alibaba-inc.com
Pseudocode Yes More details are described in Algorithm 1 in Appendix A.1.
Open Source Code Yes Code will be available at https://github.com/Daner-Wang/VTC-LFC.git.
Open Datasets Yes In this section, the proposed method is evaluated on the benchmark Image Net (ILSVRC2012) [43], which is a large dataset containing 1.2M training images and 50k validation images of 1000 classes.
Dataset Splits Yes In this section, the proposed method is evaluated on the benchmark Image Net (ILSVRC2012) [43], which is a large dataset containing 1.2M training images and 50k validation images of 1000 classes.
Hardware Specification Yes All the experiments are deployed with Pytorch [39] on NVIDIA V100 GPUs.
Software Dependencies No The paper mentions 'Pytorch' but does not specify its version number or any other software dependencies with their specific versions.
Experiment Setup Yes In the pruning procedure, the number of training samples used for evaluating the performance drop in BCP is 5000 (randomly sampling 5 training samples from each category), the number of training samples for calculating LFS is 2000, and the cutoff factors σc and σt are 0.1 and 0.85. For three models, Dei T-Tiny, Dei T-Small, and Dei T-Base, the global allowable drop ε are 9.5, 14, and 14, and the ratio ρ for the allowable drop is 0.56, 0.35, and 0.3 respectively. The base learning rate is set to 0.0001, and most of the other hyper-parameters follow the settings in [9]. We fine-tune the pruned Dei T-Tiny/Dei T-Small/Dei T-Base models for 300/150/150 epochs. More detailed settings and results of different epochs are listed in Appendix A.3.