Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Trustable SHAP Scores

Authors: Olivier Létoffé, Xuanxiang Huang, Joao Marques-Silva

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the improvements obtained with υs with respect to υe, we studied the non-boolean classifiers reported in (Huang and Marques-Silva 2024)15. For each classifier, each of the possible instances is analyzed, and the SHAP scores produced by the tools SHAP and s SHAP are recorded. If an irrelevant feature is assigned an absolute value larger than some other relevant feature, then a mismatch is declared. Table 1 summarizes the results obtained with the two tools, where columns SHAP-FRP mismatch shown the number of mismatches obtained with SHAP, and column s SHAP-FRP mismatch shows the number of mismatches obtained with s SHAP16. As can be concluded, SHAP produces several mismatches. In contrast, s SHAP produces no mismatch.
Researcher Affiliation	Collaboration	Olivier Létoffé1, Xuanxiang Huang2, Joao Marques-Silva3 1Univ. Toulouse, France 2CNRS@CREATE, Singapore 3ICREA, Univ. Lleida, Spain EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper includes mathematical definitions, propositions, and theorems but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code of s SHAP is available from https://github.com/ Xuanxiang Huang/aaai25_code
Open Datasets	No	The paper introduces a case study using a simple regression tree model adapted from a textbook (James et al. 2017) and mentions studying non-boolean classifiers reported in (Huang and Marques-Silva 2024), but it does not provide access information (links, DOIs, specific citations for datasets) for any publicly available datasets used in its own experiments.
Dataset Splits	No	The paper does not specify any training/test/validation dataset splits, percentages, or methodologies. It refers to analyzing instances of classifiers.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run its experiments, such as GPU/CPU models or memory.
Software Dependencies	No	The paper mentions the 'SHAP tool (Lundberg and Lee 2017)' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup	No	The paper describes its methodology for modifying SHAP and evaluating mismatches, but it does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings for any models.