Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Authors: Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of the proposed Q-Vi T model for image classification task using popular Dei T [25] and Swin [15] backbones. To the best of our knowledge, there is no publicly available source code on quantization-aware training of vision transformer at this point, so we implement the baseline and LSQ [5] methods by ourselves.
Researcher Affiliation Collaboration 1Beihang University, Beijing, P.R.China 2Zhongguancun Laboratory, Beijing, P.R.China 3Shanghai Artificial Intelligence Laboratory, Shanghai, P.R.China 4 Institute of Deep Learning, Baidu Research, Beijing, P.R.China 5 National Engineering Laboratory for Deep Learning Technology and Application, Beijing, P.R.China
Pseudocode No The paper describes methods through textual descriptions and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our codes and models are attached on https://github.com/Yanjing Li0202/Q-Vi T.
Open Datasets Yes The experiments are carried out on the ILSVRC12 Image Net classification dataset [12]. The Image Net dataset is more challenging due to its large scale and greater diversity. There are 1000 classes and 1.2 million training images, and 50k validation images in it.
Dataset Splits Yes There are 1000 classes and 1.2 million training images, and 50k validation images in it.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions the use of 'LAMB [34] optimizer' and 'Dei T [25] and Swin Transformer [15]' as backbones, but it does not specify version numbers for these or any other software components or libraries.
Experiment Setup Yes In our experiments, we initialize the weights of quantized model with the corresponding pretrained full-precision model. The quantized model is trained for 300 epochs with batch-size 512 and the base learning rate 2e 4. We do not use warm-up scheme. For all the experiments, we apply LAMB [34] optimizer with weight decay set as 0. Other training settings follow Dei T [25] or Swin Transformer [15]. Note that we use 8-bit for the patch embedding (first) layer and the classification (last) layer following [5].