AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries

Authors: Runqi Wang, Huixin Sun, Linlin Yang, Shaohui Lin, Chuanjian Liu, Yan Gao, Yao Hu, Baochang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through our extensive experiments on large-scale open datasets, the performance of the 4-bit quantization of DETR and Deformable DETR models is comparable to full-precision counterparts.
Researcher Affiliation Collaboration Runqi Wang1, Huixin Sun1, Linlin Yang2*, Shaohui Lin3, Chuanjian Liu4, Yan Gao5, Yao Hu5, Baochang Zhang1,6,7 1ASEE, EIE and Hangzhou Research Institute, Beihang University; 2State Key Laboratory of Media Convergence and Communication, Communication University of China; 3School of Computer Science and Technology, East China Normal University; 4Huawei Noah s Ark Lab; 5Xiaohongshu Inc; 6Zhongguancun Laboratory; 7Nanchang Institute of Technology
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes We conduct experiments on the PASCAL VOC dataset (Everingham et al. 2010) and the COCO 2017 object detection (Lin et al. 2014).
Dataset Splits Yes We use VOC trainval2012, and VOC trainval2007 for training, and VOC test2007 set for evaluation. For COCO 2017 object detection, we use its standard train and test split
Hardware Specification Yes We run the experiments on 8 NVIDIA Tesla A100 GPUs with 40 GB memory
Software Dependencies No The paper mentions 'Py Torch (Paszke et al. 2017) is used for implementing our baseline and AQ-DETR' but does not specify specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes We use Adam (Loshchilov and Hutter 2017) with a batch size of 8 and an initial learning rate of 2e 4. The scale factor of LLD λdis is set to 0.1 and K = 6 in Eq. 3 as default, and the cross-entropy is chosen as the distillation loss. The quantized DETR is trained for 300 epochs and the learning rate is multiplied by 0.1 at the 200-th epoch. we use 100 object queries and 500 auxiliary queries for training, and 100 object queries for testing in the DETR framework. Following the Deformable DETR, the quantized Deformable DETR is trained for 12 epochs, and the learning rate is multiplied by 0.1 at the 11-th epoch on both the VOC and COCO datasets. We use 300 object queries and 1500 auxiliary queries for training, and 300 object queries for testing.