Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Metric-Driven Attributions for Vision Transformers
Authors: Chase Walker, Sumit Jha, Rickard Ewetz
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation demonstrates the proposed MDA method outperforms 7 existing Vi T attribution methods by an average of 12% across 12 attribution metrics on the Image Net dataset for the Vi T-base 16 16, Vi T-tiny 16 16, and Vi T-base 32 32 models. |
| Researcher Affiliation | Academia | Chase Walker1, Sumit Kumar Jha2, Rickard Ewetz1 1 University of Florida 2 Florida International University |
| Pseudocode | No | The paper describes its methodology in Section 3 using prose and mathematical equations (e.g., Eq. 1, 2, 3, 4, 5, 8, 10, 11, 12) and illustrations (Figure 2, 3, 4) but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is publicly available at https://github.com/chasewalker26/ MDA-Metric-Driven-Attributions-for-Vi T. |
| Open Datasets | Yes | We perform all experiments using Py Torch (Paszke et al., 2019) and use the Image Net 2012 validation dataset (Russakovsky et al., 2015) and the Image Net Segmentation dataset (Guillaumin et al., 2014). |
| Dataset Splits | Yes | We perform all experiments using Py Torch (Paszke et al., 2019) and use the Image Net 2012 validation dataset (Russakovsky et al., 2015) and the Image Net Segmentation dataset (Guillaumin et al., 2014). The results in Table 1 compare all 8 attribution methods over 5000 Image Net images with 5 images per class for the perturbation metrics. |
| Hardware Specification | Yes | The experiments were run on one server with four NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch ('We perform all experiments using Py Torch (Paszke et al., 2019)') but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We employ three Vi T models: Vi T-base 16 16, Vi T-tiny 16 16, and Vi T-base 32 32 as defined in the Vi T paper (Dosovitskiy et al., 2020). For these tests, all input images are 224 224px and we use a step size of 224px, for a total of 224 perturbation steps as in the original implementations. In practice, we set the parameter τ to 0.90. In practice, we employ κ = 0.005 to only strongly attribute patches with more than 0.5% model importance. A user can tune γ to their choosing, but, quantitatively, the best explanation is created with γ = 0. |