reproducibilityindex.ai

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

Authors: Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the performance of the proposed Q-Vi T model for image classification task using popular Dei T [25] and Swin [15] backbones. To the best of our knowledge, there is no publicly available source code on quantization-aware training of vision transformer at this point, so we implement the baseline and LSQ [5] methods by ourselves.
Researcher Affiliation	Collaboration	1Beihang University, Beijing, P.R.China 2Zhongguancun Laboratory, Beijing, P.R.China 3Shanghai Artificial Intelligence Laboratory, Shanghai, P.R.China 4 Institute of Deep Learning, Baidu Research, Beijing, P.R.China 5 National Engineering Laboratory for Deep Learning Technology and Application, Beijing, P.R.China
Pseudocode	No	The paper describes methods through textual descriptions and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes and models are attached on https://github.com/Yanjing Li0202/Q-Vi T.
Open Datasets	Yes	The experiments are carried out on the ILSVRC12 Image Net classification dataset [12]. The Image Net dataset is more challenging due to its large scale and greater diversity. There are 1000 classes and 1.2 million training images, and 50k validation images in it.
Dataset Splits	Yes	There are 1000 classes and 1.2 million training images, and 50k validation images in it.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions the use of 'LAMB [34] optimizer' and 'Dei T [25] and Swin Transformer [15]' as backbones, but it does not specify version numbers for these or any other software components or libraries.
Experiment Setup	Yes	In our experiments, we initialize the weights of quantized model with the corresponding pretrained full-precision model. The quantized model is trained for 300 epochs with batch-size 512 and the base learning rate 2e 4. We do not use warm-up scheme. For all the experiments, we apply LAMB [34] optimizer with weight decay set as 0. Other training settings follow Dei T [25] or Swin Transformer [15]. Note that we use 8-bit for the patch embedding (first) layer and the classification (last) layer following [5].