Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Regional Explanations: Bridging Local and Global Variable Importance

Authors: Salim I. Amoukou, Nicolas Brunel

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our first set of experiments, we evaluate our approach, R-LOCO, by comparing it with three baseline methods: two local attribution techniques, L-SV and LIME, and LOCO, which serves as the global variant of our approach. We conduct experiments in a controlled environment where the ground-truth explanations are known, using independent features to assess each method s performance without confounding bias. Our objective is to demonstrate that R-LOCO, which lies between the local and global regimes, offers greater local fidelity than state-of-the-art local attribution methods and provides a more nuanced understanding of model behavior compared to global approaches.
Researcher Affiliation	Collaboration	Salim I. Amoukou J.P. Morgan AI Research Nicolas J-B. Brunel La MME, ENSIIE, University Paris Saclay & Capgemini Invent
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 6 provides a diagram of the R-LOCO workflow but not structured pseudocode.
Open Source Code	No	The code will be released upon acceptance.
Open Datasets	Yes	Figures 1 and 2 show the results on the Diabetes and California datasets from Dua and Graff [2017], respectively. In Figure 1, our method consistently outperformed all baselines. While FDT performed best among the baselines, clustering SHAP values (CSHAP) did not outperform standard SHAP.
Dataset Splits	Yes	Each dataset is divided into three parts: training, calibration, and test. Initially, we split the data into a training set (75%) and a test set (25%). The training set is then further split evenly into a new training and calibration set.
Hardware Specification	Yes	We execute all experiments on a Mac Book Pro M2, with none of them being particularly costly to conduct.
Software Dependencies	No	For the baselines, we compute L-SV using the exact explainer from the SHAP library. For LIME, we utilize the LIME package with default parameters, as described in the original paper [Ribeiro et al., 2016a]. For R-LOCO, we use XGBoost (default parameters) as the base model to approximate the leave-one-out functions bf0 and bf j.
Experiment Setup	Yes	For R-LOCO, we use XGBoost (default parameters) as the base model to approximate the leave-one-out functions bf0 and bf j. We use Affinity Propagation with damping = 0.8.