reproducibilityindex.ai

FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Authors: Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, Shuchang Zhou

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on various transformer-based architectures and benchmarks show that our Fully Quantized Vision Transformer (FQ-Vi T) outperforms previous works while even using lower bitwidth on attention maps. For instance, we reach 84.89% top-1 accuracy with Vi T-L on Image Net and 50.8 m AP with Cascade Mask R-CNN (Swin S) on COCO. To our knowledge, we are the first to achieve lossless accuracy degradation ( 1%) on fully quantized vision transformers.
Researcher Affiliation	Industry	Yang Lin , Tianyu Zhang , Peiqin Sun , Zheng Li and Shuchang Zhou MEGVII Technology linyang.zhh@gmail.com, {zhangtianyu, sunpeiqin, lizheng02, zsc}@megvii.com
Pseudocode	No	The paper describes algorithms and formulas but does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/megvii-research/ FQ-Vi T.
Open Datasets	Yes	We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3. Image Net [Krizhevsky et al., 2012] and COCO [Lin et al., 2014]
Dataset Splits	Yes	We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments. It only mentions general concepts like 'resource-constrained hardware devices' and 'floating-point units in the hardware'.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We randomly sample 1000 training images from Image Net or COCO as the calibration data, and use the validation set to evaluate performance. Apart from special notes, we perform symmetric channel-wise quantization for weights and asymmetric layer-wise quantization for activations. For a fair comparison, the quantization for weights is fixed as Min Max. The hyperparameter K in Power-of-Two Factor is set to 3.