Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards credible visual model interpretation with path attribution

Authors: Naveed Akhtar, Mohammad A. A. K. Jalwana

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also establish the findings empirically by evaluating the method on multiple datasets, models and evaluation metrics. Extensive experiments show a consistent quantitative and qualitative gain in the results over the baselines. and 7. Empirical Evidence
Researcher Affiliation	Academia	1Computer Science and Software Engineering, The University of Western Australia, 35 Stirling highway, 6009 Crawley, Australia.
Pseudocode	Yes	Algorithm 1 Compute Baseline and Algorithm 2 Path integration
Open Source Code	No	No explicit statement about providing open-source code for the methodology described in this paper or a direct link to a repository was found. The paper mentions using 'author-provided codes for these methods' for benchmarking, but not for their own.
Open Datasets	Yes	Image Net (Deng et al., 2009) and CIFAR-10 (Krizhevsky et al.).
Dataset Splits	Yes	For each model, the results are averaged over 2,500 images from the Image Net validation set. and on 1000 images of CIFAR-10 validation set.
Hardware Specification	Yes	In Table 5, we report the average computational time (in seconds) required by our method and IG for both Image Net and CIFAR-10 models, computed for NVIDIA RTX 3090 with 24GB RAM using a Pytorch implementation.
Software Dependencies	No	The paper mentions 'Pytorch implementation' but does not provide specific version numbers for Pytorch or any other software libraries used.
Experiment Setup	Yes	For all the methods, we allow 150 steps. Since our technique enables the use of multiple baselines, we use 3. The reported results in the main paper, and qualitative results shown in E of this document use 150 steps, 3 baselines and δ = 5. To perform the initialization, we simply use fixed blur kernels of size 51 for Image Net images and 7 for CIFAR-10 images. We empirically noticed that with η = 1/255, the logits almost always matched reasonably well after 15 iterations.