Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Strengthening Interpretability: An Investigative Study of Integrated Gradient Methods

Authors: Shree Singhi, Anupriya Kumari

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted a reproducibility study on Integrated Gradients (IG) based methods and the Important Direction Gradient Integration (IDGI) framework. [...] We also experimentally verify the authors claims concerning the performance of IDGI over IG-based methods. Additionally, we varied the number of steps used in the Riemann approximation, an essential parameter in all IG methods, and analyzed the corresponding change in results. We also studied the numerical instability of the attribution methods to check the consistency of the saliency maps produced. We developed the complete code to implement IDGI over the baseline IG methods and evaluated them using three metrics since the available code was insufficient for this study. Our code is readily usable and publicly available at https://github.com/Shree Singhi/TMLR-IDGI. [...] Section 4 Experimental Methodology and Results
Researcher Affiliation	Academia	Shree Singhi EMAIL Department of Data Science & Artificial Intelligence Indian Institute of Technology, Roorkee Anupriya Kumari EMAIL Department of Electronics & Communication Engineering Indian Institute of Technology, Roorkee
Pseudocode	Yes	Algorithm 1 Important Direction Integrated Gradient Inputs: x, f, c, path : [x , . . . , xj, . . . , x]
Open Source Code	Yes	We developed the complete code to implement IDGI over the baseline IG methods and evaluated them using three metrics since the available code was insufficient for this study. Our code is readily usable and publicly available at https://github.com/Shree Singhi/TMLR-IDGI. [...] We could not directly use the existing code for our study, which led us to integrate the code for IDGI 1 provided by the authors and use the original implementations 2 of the authors code for IG, GIG, and Blur IG. [...] 1 https://github.com/yangruo1226/IDGI 2 https://github.com/PAIR-code/saliency
Open Datasets	Yes	Datasets. We used the same dataset as the original paper The Imagenet validation dataset, which contains 50K test samples with labels and annotations.
Dataset Splits	Yes	Datasets. We used the same dataset as the original paper The Imagenet validation dataset, which contains 50K test samples with labels and annotations. We also tested the explanation methods for each model on images that show that the model predicted the label correctly, which varies from 33K to 39K, corresponding to different models.
Hardware Specification	Yes	Computational Requirements. We used a single NVIDIA Tesla V100 GPU with 16 GB of VRAM for our reproducibility experiments.
Software Dependencies	Yes	Models. We use the Py Torch (1.13.1) pre-trained models: Dense Net121, 169, 201, Inception V3, Mobile Net V2, Res Net50,101,151V2, and VGG16,19.
Experiment Setup	Yes	We use the same baseline methods (IG, GIG, and Blur IG) as the authors original work. Following the implementations of IDGI, we also use the original implementations with default parameters in the authors code for IG, GIG, and Blur IG. We use the black image as the reference point for IG and GIG. Finally, as previously mentioned, we use different step sizes (8, 16, 32, 64, and 128) as an additional experiment beyond the original paper to verify our hypothesis on how sensitive IDGI is to step size.