Statistical Test for Attention Maps in Vision Transformers

Authors: Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka, Vo Nguyen Le Duy, Kouichi Taji, Ichiro Takeuchi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the validity and the effectiveness of the proposed method through numerical experiments and applications to brain image diagnoses. and 4. Numerical Experiments
Researcher Affiliation Academia 1Nagoya University, Aichi, Japan 2Nagoya Institute of Technology, Aichi, Japan 3University of Information Technology, Ho Chi Minh City, Vietnam 4Vietnam National University, Ho Chi Minh City, Vietnam 5RIKEN, Tokyo, Japan.
Pseudocode Yes Algorithm 1 Selective p-value Computation by Adaptive Grid Search
Open Source Code Yes For reproducibility, our implementation is available at https://github.com/shirara1016/ statistical_test_for_vit_attention.
Open Datasets Yes We examined the brain image dataset extracted from the dataset used in Buda et al. (2019), which included 939 and 941 images with and without tumors, respectively.
Dataset Splits No The paper states specific numbers for training and testing data for the brain image dataset ('700 images each with and without tumors for training' and 'The remaining images with and without tumors were used for testing'), but it does not specify a separate validation split or explicit percentages/counts for all splits needed for reproduction across all datasets.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments.
Software Dependencies No The paper mentions using 'TensorFlow' for auto differentiation but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes In all experiments, we set the threshold value τ = 0.6, the grid search interval [ S, S] with S = 10 + |zobs|, the minimum grid width εmin = 10 4, the maximum grid width εmax = 0.2, and the significance level α = 0.05.