Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Strengthening Interpretability: An Investigative Study of Integrated Gradient Methods
Authors: Shree Singhi, Anupriya Kumari
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted a reproducibility study on Integrated Gradients (IG) based methods and the Important Direction Gradient Integration (IDGI) framework. [...] We also experimentally verify the authors claims concerning the performance of IDGI over IG-based methods. Additionally, we varied the number of steps used in the Riemann approximation, an essential parameter in all IG methods, and analyzed the corresponding change in results. We also studied the numerical instability of the attribution methods to check the consistency of the saliency maps produced. We developed the complete code to implement IDGI over the baseline IG methods and evaluated them using three metrics since the available code was insufficient for this study. Our code is readily usable and publicly available at https://github.com/Shree Singhi/TMLR-IDGI. [...] Section 4 Experimental Methodology and Results |
| Researcher Affiliation | Academia | Shree Singhi EMAIL Department of Data Science & Artificial Intelligence Indian Institute of Technology, Roorkee Anupriya Kumari EMAIL Department of Electronics & Communication Engineering Indian Institute of Technology, Roorkee |
| Pseudocode | Yes | Algorithm 1 Important Direction Integrated Gradient Inputs: x, f, c, path : [x , . . . , xj, . . . , x] |
| Open Source Code | Yes | We developed the complete code to implement IDGI over the baseline IG methods and evaluated them using three metrics since the available code was insufficient for this study. Our code is readily usable and publicly available at https://github.com/Shree Singhi/TMLR-IDGI. [...] We could not directly use the existing code for our study, which led us to integrate the code for IDGI 1 provided by the authors and use the original implementations 2 of the authors code for IG, GIG, and Blur IG. [...] 1 https://github.com/yangruo1226/IDGI 2 https://github.com/PAIR-code/saliency |
| Open Datasets | Yes | Datasets. We used the same dataset as the original paper The Imagenet validation dataset, which contains 50K test samples with labels and annotations. |
| Dataset Splits | Yes | Datasets. We used the same dataset as the original paper The Imagenet validation dataset, which contains 50K test samples with labels and annotations. We also tested the explanation methods for each model on images that show that the model predicted the label correctly, which varies from 33K to 39K, corresponding to different models. |
| Hardware Specification | Yes | Computational Requirements. We used a single NVIDIA Tesla V100 GPU with 16 GB of VRAM for our reproducibility experiments. |
| Software Dependencies | Yes | Models. We use the Py Torch (1.13.1) pre-trained models: Dense Net121, 169, 201, Inception V3, Mobile Net V2, Res Net50,101,151V2, and VGG16,19. |
| Experiment Setup | Yes | We use the same baseline methods (IG, GIG, and Blur IG) as the authors original work. Following the implementations of IDGI, we also use the original implementations with default parameters in the authors code for IG, GIG, and Blur IG. We use the black image as the reference point for IG and GIG. Finally, as previously mentioned, we use different step sizes (8, 16, 32, 64, and 128) as an additional experiment beyond the original paper to verify our hypothesis on how sensitive IDGI is to step size. |