Improving Interpretation Faithfulness for Vision Transformers
Authors: Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, Di Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the effectiveness of our approach through comprehensive experiments and evaluations. Results show that FVi Ts are more robust against adversarial attacks while maintaining the explainability of attention, indicating higher faithfulness. |
| Researcher Affiliation | Academia | 1King Abdullah University of Science and Technology (KAUST) 2Provable Responsible AI and Data Analytics (PRADA) Lab 3SDAIA-KAUST AI 4Lehigh University 5University of Georgia 6Iowa State University. |
| Pseudocode | Yes | Algorithm 1 FVi Ts via Denoised Diffusion Smoothing; Algorithm 2 Finding the Faithfulness Region in FVi Ts |
| Open Source Code | No | The paper references a third-party tool's GitHub repository ('Jacobgil/vit-explain') but does not state that the code for its own methodology is open-source or provide a link for it. |
| Open Datasets | Yes | For the classification task, we use ILSVRC-2012 Image Net. And for segmentation, we use Image Net-segmentation subset (Guillaumin et al., 2014), COCO (Lin et al., 2014), and Cityscape (Cordts et al., 2016). |
| Dataset Splits | No | The paper mentions implementing "early stopping with a criterion of 20 epochs" which implies a validation set, and refers to an "Image Net-1k sampled validation set" in Table 8. However, it does not provide specific details on the split percentages or counts for training, validation, and test sets to enable full reproduction of the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running its experiments. It only mentions time costs in seconds per image. |
| Software Dependencies | No | The paper mentions using "the timm library" for feature extractors and "Adam optimizer" but does not specify version numbers for these or other software components like Python, PyTorch, or CUDA, which are necessary for reproducible setup. |
| Experiment Setup | Yes | For the downstream dataset, we then fine-tuned these models using the Adam optimizer with a learning rate of 0.001 for a total of 50 epochs, with a batch size of 128. To prevent overfitting, we implemented early stopping with a criterion of 20 epochs. For data augmentation, we follow the common practice: Resize(256) Center Crop(224) To Tensor Normalization. And the mean and stand deviation of normalization are both [0.5, 0.5, 0.5]. |