reproducibilityindex.ai

Improving Interpretation Faithfulness for Vision Transformers

Authors: Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, Di Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the effectiveness of our approach through comprehensive experiments and evaluations. Results show that FVi Ts are more robust against adversarial attacks while maintaining the explainability of attention, indicating higher faithfulness.
Researcher Affiliation	Academia	1King Abdullah University of Science and Technology (KAUST) 2Provable Responsible AI and Data Analytics (PRADA) Lab 3SDAIA-KAUST AI 4Lehigh University 5University of Georgia 6Iowa State University.
Pseudocode	Yes	Algorithm 1 FVi Ts via Denoised Diffusion Smoothing; Algorithm 2 Finding the Faithfulness Region in FVi Ts
Open Source Code	No	The paper references a third-party tool's GitHub repository ('Jacobgil/vit-explain') but does not state that the code for its own methodology is open-source or provide a link for it.
Open Datasets	Yes	For the classification task, we use ILSVRC-2012 Image Net. And for segmentation, we use Image Net-segmentation subset (Guillaumin et al., 2014), COCO (Lin et al., 2014), and Cityscape (Cordts et al., 2016).
Dataset Splits	No	The paper mentions implementing "early stopping with a criterion of 20 epochs" which implies a validation set, and refers to an "Image Net-1k sampled validation set" in Table 8. However, it does not provide specific details on the split percentages or counts for training, validation, and test sets to enable full reproduction of the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running its experiments. It only mentions time costs in seconds per image.
Software Dependencies	No	The paper mentions using "the timm library" for feature extractors and "Adam optimizer" but does not specify version numbers for these or other software components like Python, PyTorch, or CUDA, which are necessary for reproducible setup.
Experiment Setup	Yes	For the downstream dataset, we then fine-tuned these models using the Adam optimizer with a learning rate of 0.001 for a total of 50 epochs, with a batch size of 128. To prevent overfitting, we implemented early stopping with a criterion of 20 epochs. For data augmentation, we follow the common practice: Resize(256) Center Crop(224) To Tensor Normalization. And the mean and stand deviation of normalization are both [0.5, 0.5, 0.5].