reproducibilityindex.ai

Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Authors: Saebom Leem, Hyunseok Seo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a result, our method outperforms the previous leading explainability methods of Vi T in the weakly-supervised localization task and presents great capability in capturing the full instances of the target class object. Meanwhile, our method provides a visualization that faithfully explains the model, which is demonstrated in the perturbation comparison test.In this section, we present the results of the performance comparison of our method with previous leading methods.
Researcher Affiliation	Academia	Saebom Leem1,2, Hyunseok Seo1* 1Korea Institute of Science and Technology 2Sogang University toqha1215@sogang.ac.kr, seo@kist.kr
Pseudocode	No	The paper describes its methodology using text and mathematical equations (e.g., Eq. 1-8) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or a direct link for open-source code for the methodology described.
Open Datasets	Yes	For the evaluation, we used the validation set of Image Net ILSVRC 2012 (Russakovsky et al. 2015) and Pascal VOC 2012 (Everingham et al. 2012) and the test set of Caltech-UCSD Birds-200-2011 (CUB 200) (Wah et al. 2011), which provide the bounding-box annotation label.
Dataset Splits	Yes	The result of the weakly-supervised object detection on the Image Net ILSVRC 2012 validation set is presented in Table 1.The localization performance on the Pascal VOC 2012 validation is presented in Table 2.
Hardware Specification	No	The paper mentions evaluating methods with a 'Vi T-base model' but does not specify any hardware details like GPU or CPU models used for the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers, such as programming language versions or library versions (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	All methods are evaluated with the same Vi T-base (Dosovitskiy et al. 2020) model that takes the input image with a size of [224 224 3]. All methods share the same model parameters and the fine-tuning details of the model parameters are provided in the supplementary material. In this Vi T, the input images are converted into [14 14] number of patches and therefore each method generates a heatmap with a size of [14 14 1] where one pixel corresponds to the contribution of one image patch of the input image.