reproducibilityindex.ai

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Authors: Yanjing Li, Sheng Xu, Mingbao Lin, Xianbin Cao, Chuanjian Liu, Xiao Sun, Baochang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Image Net benchmark demonstrate that Bi-Vi T surpasses both the baseline and prior binarized methods by a significant margin, achieving a remarkable acceleration rate of up to 61.5 .
Researcher Affiliation	Collaboration	Yanjing Li1, Sheng Xu1, Mingbao Lin2, Xianbin Cao1 , Chuanjian Liu3, Xiao Sun4 , Baochang Zhang5,6,7 1 Beihang University 2 Tencent 3 Huawei Noah s Ark Lab 4 Shanghai Artificial Intelligence Laboratory 5 Zhongguancun Laboratory 6 Hangzhou Research Institute, Beihang University 7 Nanchang Institute of Technology
Pseudocode	No	The paper presents mathematical formulations and architectural diagrams but does not include structured pseudocode or explicitly labeled algorithm blocks.
Open Source Code	Yes	Our codes and models are attached on https: //github.com/Yanjing Li0202/Bi-Vi T/
Open Datasets	Yes	The experiments are conducted on the Image Net ILSVRC12 dataset (Krizhevsky, Sutskever, and Hinton 2012) for image classification task. The Image Net dataset is challenging due to its large scale and greater diversity. There are 1000 classes and 1.2 million training images, and 50k validation images in it.
Dataset Splits	Yes	The Image Net dataset... There are 1000 classes and 1.2 million training images, and 50k validation images in it.
Hardware Specification	No	The paper does not explicitly state the specific hardware components (e.g., GPU model, CPU type, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using LAMB optimizer and backbones like DeiT and Swin Transformer, but it does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In our experiments, we initialize the weights of binarized model with the pretrained realvalued model. The binarized model is trained for 300 epochs with batch-size 512 and the base learning rate 5e 4 without warm-up scheme. For all the experiments, we apply LAMB (You et al. 2020) optimizer with weight decay set as 0, following Dei T III (Touvron, Cord, and J egou 2022). Note that we keep the patch embedding (first) layer and the classification (last) layer as real-valued, following (Esser et al. 2019).