FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Authors: Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, Shuchang Zhou

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on various transformer-based architectures and benchmarks show that our Fully Quantized Vision Transformer (FQ-Vi T) outperforms previous works while even using lower bitwidth on attention maps. For instance, we reach 84.89% top-1 accuracy with Vi T-L on Image Net and 50.8 m AP with Cascade Mask R-CNN (Swin S) on COCO. To our knowledge, we are the first to achieve lossless accuracy degradation ( 1%) on fully quantized vision transformers.
Researcher Affiliation Industry Yang Lin , Tianyu Zhang , Peiqin Sun , Zheng Li and Shuchang Zhou MEGVII Technology linyang.zhh@gmail.com, {zhangtianyu, sunpeiqin, lizheng02, zsc}@megvii.com
Pseudocode No The paper describes algorithms and formulas but does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/megvii-research/ FQ-Vi T.
Open Datasets Yes We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3. Image Net [Krizhevsky et al., 2012] and COCO [Lin et al., 2014]
Dataset Splits Yes We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3.
Hardware Specification No The paper does not specify the hardware used for running the experiments. It only mentions general concepts like 'resource-constrained hardware devices' and 'floating-point units in the hardware'.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3.