Bi-ViT: Pushing the Limit of Vision Transformer Quantization
Authors: Yanjing Li, Sheng Xu, Mingbao Lin, Xianbin Cao, Chuanjian Liu, Xiao Sun, Baochang Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Image Net benchmark demonstrate that Bi-Vi T surpasses both the baseline and prior binarized methods by a significant margin, achieving a remarkable acceleration rate of up to 61.5 . |
| Researcher Affiliation | Collaboration | Yanjing Li1*, Sheng Xu1*, Mingbao Lin2, Xianbin Cao1 , Chuanjian Liu3, Xiao Sun4 , Baochang Zhang5,6,7 1 Beihang University 2 Tencent 3 Huawei Noah s Ark Lab 4 Shanghai Artificial Intelligence Laboratory 5 Zhongguancun Laboratory 6 Hangzhou Research Institute, Beihang University 7 Nanchang Institute of Technology |
| Pseudocode | No | The paper presents mathematical formulations and architectural diagrams but does not include structured pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | Yes | Our codes and models are attached on https: //github.com/Yanjing Li0202/Bi-Vi T/ |
| Open Datasets | Yes | The experiments are conducted on the Image Net ILSVRC12 dataset (Krizhevsky, Sutskever, and Hinton 2012) for image classification task. The Image Net dataset is challenging due to its large scale and greater diversity. There are 1000 classes and 1.2 million training images, and 50k validation images in it. |
| Dataset Splits | Yes | The Image Net dataset... There are 1000 classes and 1.2 million training images, and 50k validation images in it. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware components (e.g., GPU model, CPU type, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using LAMB optimizer and backbones like DeiT and Swin Transformer, but it does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In our experiments, we initialize the weights of binarized model with the pretrained realvalued model. The binarized model is trained for 300 epochs with batch-size 512 and the base learning rate 5e 4 without warm-up scheme. For all the experiments, we apply LAMB (You et al. 2020) optimizer with weight decay set as 0, following Dei T III (Touvron, Cord, and J egou 2022). Note that we keep the patch embedding (first) layer and the classification (last) layer as real-valued, following (Esser et al. 2019). |