Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Regional Explanations: Bridging Local and Global Variable Importance
Authors: Salim I. Amoukou, Nicolas Brunel
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our first set of experiments, we evaluate our approach, R-LOCO, by comparing it with three baseline methods: two local attribution techniques, L-SV and LIME, and LOCO, which serves as the global variant of our approach. We conduct experiments in a controlled environment where the ground-truth explanations are known, using independent features to assess each method s performance without confounding bias. Our objective is to demonstrate that R-LOCO, which lies between the local and global regimes, offers greater local fidelity than state-of-the-art local attribution methods and provides a more nuanced understanding of model behavior compared to global approaches. |
| Researcher Affiliation | Collaboration | Salim I. Amoukou J.P. Morgan AI Research Nicolas J-B. Brunel La MME, ENSIIE, University Paris Saclay & Capgemini Invent |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 6 provides a diagram of the R-LOCO workflow but not structured pseudocode. |
| Open Source Code | No | The code will be released upon acceptance. |
| Open Datasets | Yes | Figures 1 and 2 show the results on the Diabetes and California datasets from Dua and Graff [2017], respectively. In Figure 1, our method consistently outperformed all baselines. While FDT performed best among the baselines, clustering SHAP values (CSHAP) did not outperform standard SHAP. |
| Dataset Splits | Yes | Each dataset is divided into three parts: training, calibration, and test. Initially, we split the data into a training set (75%) and a test set (25%). The training set is then further split evenly into a new training and calibration set. |
| Hardware Specification | Yes | We execute all experiments on a Mac Book Pro M2, with none of them being particularly costly to conduct. |
| Software Dependencies | No | For the baselines, we compute L-SV using the exact explainer from the SHAP library. For LIME, we utilize the LIME package with default parameters, as described in the original paper [Ribeiro et al., 2016a]. For R-LOCO, we use XGBoost (default parameters) as the base model to approximate the leave-one-out functions bf0 and bf j. |
| Experiment Setup | Yes | For R-LOCO, we use XGBoost (default parameters) as the base model to approximate the leave-one-out functions bf0 and bf j. We use Affinity Propagation with damping = 0.8. |