FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Authors: Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, Shuchang Zhou
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on various transformer-based architectures and benchmarks show that our Fully Quantized Vision Transformer (FQ-Vi T) outperforms previous works while even using lower bitwidth on attention maps. For instance, we reach 84.89% top-1 accuracy with Vi T-L on Image Net and 50.8 m AP with Cascade Mask R-CNN (Swin S) on COCO. To our knowledge, we are the first to achieve lossless accuracy degradation ( 1%) on fully quantized vision transformers. |
| Researcher Affiliation | Industry | Yang Lin , Tianyu Zhang , Peiqin Sun , Zheng Li and Shuchang Zhou MEGVII Technology linyang.zhh@gmail.com, {zhangtianyu, sunpeiqin, lizheng02, zsc}@megvii.com |
| Pseudocode | No | The paper describes algorithms and formulas but does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/megvii-research/ FQ-Vi T. |
| Open Datasets | Yes | We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3. Image Net [Krizhevsky et al., 2012] and COCO [Lin et al., 2014] |
| Dataset Splits | Yes | We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments. It only mentions general concepts like 'resource-constrained hardware devices' and 'floating-point units in the hardware'. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3. |