ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Authors: Yunshan Zhong, Jiawei Hu, You Huang, Yuxin Zhang, Rongrong Ji

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results attest to the effectiveness of our approach. Notably, ERQ surpasses the state-of-the-art GPTQ by 22.36% in accuracy for W3A4 Vi T-S.
Researcher Affiliation Academia 1Institute of Artificial Intelligence, Xiamen University. 2Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University. 3Department of Artificial Intelligence, School of Informatics, Xiamen University. 4Peng Cheng Laboratory.
Pseudocode Yes Algorithm 1 Weight Quantization Error Reduction
Open Source Code No The paper does not provide explicit statements about the release of its own source code for ERQ, nor does it include a link to a code repository for its method.
Open Datasets Yes We conduct extensive experiments on image classification, object detection, and instance segmentation. For the image classification task, we evaluate the ERQ on the Image Net dataset (Russakovsky et al., 2015), with different Vi T variants including Vi T (Dosovitskiy et al., 2021), Dei T (Touvron et al., 2021), and Swin (Liu et al., 2021a). As for object detection and instance segmentation tasks, we evaluate ERQ on the COCO dataset (Lin et al., 2014) with Mask R-CNN (He et al., 2017) and Cascade Mask R-CNN (Cai & Vasconcelos, 2018), both using Swin (Liu et al., 2021a) as their backbone.
Dataset Splits Yes Consistent with previous study (Li et al., 2023), we randomly select 32 images each from the Image Net and 1 image from the COCO dataset. The quantization parameters are determined by forwarding the calibration datasets, and the reparameterization technique is used to initialize the activation quantizer as in (Li et al., 2023).
Hardware Specification Yes All experiments are implemented using Py Torch framework (Paszke et al., 2019) with a single NVIDIA 3090 GPU and an Intel Xeon 4214R CPU.
Software Dependencies No The paper mentions 'Py Torch framework (Paszke et al., 2019)' and 'pulp (a CPU-only LP modeler written in Python)' but does not provide specific version numbers for these software components.
Experiment Setup Yes In our experiments, the k and maximum iteration of Rounding Refinement are set to 1 and 100, respectively. We use the pulp (a CPU-only LP modeler written in Python) to solve the MIPQ. For the image classification task, we set λ1 = λ2 = 1e4 for Vi T, λ1 = λ2 = 1e3 for Dei T-T, λ1 = λ2 = 1e4 for Dei T-S and Dei T-B, and λ1 = λ2 = 1e4 for Swin. For detection and segmentation tasks, we set λ1 = λ2 = 1e5 for all models.