ViT-Calibrator: Decision Stream Calibration for Vision Transformer

Authors: Lin Chen, Zhijie Jia, Lechao Cheng, Yang Gao, Jie Lei, Yijun Bei, Zunlei Feng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on commonly used datasets show that the proposed approach can achieve promising results. Extensive experiments show that the proposed ViT-Calibrator can effectively calibrate the erroneous feature stream in the forward propagation and further improve the performance of the Transformer model. Experiments In the experiment, the adopted classifiers, datasets, and experiment settings are listed as follows. Dataset. The datasets contain CIFAR-100 (Krizhevsky, Hinton et al. 2009), ImageNet-1K (Deng et al. 2009) and ImageNet-Real (Hendrycks et al. 2021). Table 1: The base and improved accuracy of 7 classifiers on 3 mainstream classification datasets. Ablation Study.
Researcher Affiliation Collaboration Lin Chen1, Zhijie Jia1, Lechao Cheng2, Yang Gao1, Jie Lei3, Yijun Bei1*, Zunlei Feng1 1Zhejiang University 2Zhejiang Lab 3Zhejiang University of Technology {lin chen, jiazhijie, roygao, beiyj, zunleifeng}@zju.edu.cn, chenglc@zhejianglab.com, jasonlei@zjut.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its own source code. It only references a third-party library: 'Wightman 2020. PyTorch Image Models (Timm). https://github.com/rwightman/pytorch-image-models.'
Open Datasets Yes Dataset. The datasets contain CIFAR-100 (Krizhevsky, Hinton et al. 2009), ImageNet-1K (Deng et al. 2009) and ImageNet-Real (Hendrycks et al. 2021).
Dataset Splits No The paper uses well-known datasets that have standard splits, but it does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) within the text.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using the 'timm library' and 'PyTorch' but does not specify exact version numbers for these or any other software components, which is necessary for reproducibility.
Experiment Setup Yes For parameter settings in the token calibration stage, we chose the third layer as the starting output layer for feedback and the final layer as the feedback input layer, with a total of three feedback layers. For parameter settings in the dimension calibration stage, we use the mean correlation of J images as the threshold selection criterion for the selection threshold v corresponding to specific category dimensions. In the category-related dimension filtering step, according to different models, we select the dimension within the range of 40% to 60% as relevant dimensions by default. In the baseline setting, for ImageNet, we use the pre-training weights publicly available in the timm library (Wightman 2020). Additionally, we fix the random seed to ensure the stability of the experiment.