Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
Authors: Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of the proposed Q-Vi T model for image classification task using popular Dei T [25] and Swin [15] backbones. To the best of our knowledge, there is no publicly available source code on quantization-aware training of vision transformer at this point, so we implement the baseline and LSQ [5] methods by ourselves. |
| Researcher Affiliation | Collaboration | 1Beihang University, Beijing, P.R.China 2Zhongguancun Laboratory, Beijing, P.R.China 3Shanghai Artificial Intelligence Laboratory, Shanghai, P.R.China 4 Institute of Deep Learning, Baidu Research, Beijing, P.R.China 5 National Engineering Laboratory for Deep Learning Technology and Application, Beijing, P.R.China |
| Pseudocode | No | The paper describes methods through textual descriptions and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes and models are attached on https://github.com/Yanjing Li0202/Q-Vi T. |
| Open Datasets | Yes | The experiments are carried out on the ILSVRC12 Image Net classification dataset [12]. The Image Net dataset is more challenging due to its large scale and greater diversity. There are 1000 classes and 1.2 million training images, and 50k validation images in it. |
| Dataset Splits | Yes | There are 1000 classes and 1.2 million training images, and 50k validation images in it. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions the use of 'LAMB [34] optimizer' and 'Dei T [25] and Swin Transformer [15]' as backbones, but it does not specify version numbers for these or any other software components or libraries. |
| Experiment Setup | Yes | In our experiments, we initialize the weights of quantized model with the corresponding pretrained full-precision model. The quantized model is trained for 300 epochs with batch-size 512 and the base learning rate 2e 4. We do not use warm-up scheme. For all the experiments, we apply LAMB [34] optimizer with weight decay set as 0. Other training settings follow Dei T [25] or Swin Transformer [15]. Note that we use 8-bit for the patch embedding (first) layer and the classification (last) layer following [5]. |