Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient XAI: A Low-Cost Data Reduction Approach to SHAP Interpretability

Authors: Severin Bachmann

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through controlled experiments on synthetic datasets, we analyze the stability of SHAP values under Slovin-based subsampling across varying data characteristics, including feature and target types and distributions, and dataset sizes. Our findings reveal a U-shaped trade-off: SHAP values for midranked features remain stable, whereas extreme values exhibit higher fluctuations.
Researcher Affiliation	Academia	Corresponding Author. Author s Contact Information: Severin Bachmann, orcid: 0000-0001-5996-4152, EMAIL, Nuremberg Research Institute for Cooperative Studies, Nuremberg, Bavaria, Germany.
Pseudocode	No	The paper does not contain any explicit pseudocode or algorithm blocks. Figure 1 provides a 'Methodology overview' as a diagram, but not in a pseudocode format.
Open Source Code	No	The paper discusses the methodology and application of Slovin's formula but does not provide any explicit statements or links to open-source code for the described work.
Open Datasets	No	Through controlled experiments on synthetic datasets, we analyze the stability of SHAP values under Slovin-based subsampling across varying data characteristics, including feature and target types and distributions, and dataset sizes. Our findings reveal a U-shaped trade-off: SHAP values for midranked features remain stable, whereas extreme values exhibit higher fluctuations.The data section then outlines the generation of synthetic datasets in detail.
Dataset Splits	Yes	The numbers represent the test dataset sizes of the total dataset sizes illustrated in Figure 2 and they correspond to them being the 20%-fraction.
Hardware Specification	No	The paper mentions the general concept of GPUs' computational power in the literature review, but it does not specify any particular hardware (e.g., GPU models, CPU models, or memory) used for running the experiments described in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., specific Python libraries like PyTorch or TensorFlow with their versions) that were used to conduct the experiments.
Experiment Setup	No	Once the synthetic data is prepared, it is used to train several machine learning models. The selected models include linear regression (LR), extreme gradient boosting (XGB), neural networks (NN), and support vector machines (SVM). Each of these models offers unique advantages, ranging from the simplicity and interpretability of LR to the sophisticated capabilities of NN and XGB for capturing non-linear relationships and interactions. This range covers the common models applied in ML contexts.The paper lists the types of models used (LR, XGB, NN, SVM) but does not provide specific hyperparameter values or training configurations for these models.