Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Minimizing False-Positive Attributions in Explanations of Non-Linear Models
Authors: Anders Gjølbye, Stefan Haufe, Lars Kai Hansen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Pattern Local on the XAI-TRIS benchmark, artificial lesion MRI benchmark, and an EEG Motor imagery dataset, and compare it with a range of established XAI methods. |
| Researcher Affiliation | Academia | Anders Gjølbye1 Stefan Haufe2,3,4 Lars Kai Hansen1 1Technical University of Denmark 2Technische Universität Berlin 3Physikalisch-Technische Bundesanstalt, Berlin 4Charité Universitätsmedizin Berlin EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes the Pattern Local method and its formal objective using mathematical equations and textual descriptions, but it does not present structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/gjoelbye/PatternLocal. |
| Open Datasets | Yes | We evaluate Pattern Local on the XAI-TRIS benchmark, artificial lesion MRI benchmark, and an EEG Motor imagery dataset, and compare it with a range of established XAI methods. |
| Dataset Splits | Yes | For training and evaluation, each dataset is split into Dtrain, Dval, and Dtest in a 90/5/5 ratio. |
| Hardware Specification | Yes | All experiments were executed on a local high-performance computing (HPC) cluster equipped with Intel Xeon E5-2650 v4 CPUs (12 cores, 24 threads, 2.20 GHz) and 256 GB RAM per node. No dedicated GPUs were required. Jobs were managed with SLURM 22.05 and ran under Alma Linux 9.5. |
| Software Dependencies | Yes | The codebase is primarily written in Python 3.13.0. Key libraries are: Num Py 2.1.3, Py Torch 2.6 for model definition, Py Torch-Lightning 2.5 for model training and evaluation, scikit-learn 1.6.1 for classical baselines and metrics, hyperopt 0.2.7 for Bayesian optimization, POT 0.9.5 for Earth-Mover-Distance evaluation, Hydra 1.3.2 for experiment handling. |
| Experiment Setup | Yes | Table 2: Hyperparameters used for model training. Hyperparameter Value Initial learning rate 1 × 10−4 Batch size 128 LR-scheduler factor 0.1 Patience (LR + early stop) 100 epochs Maximum training epochs 500 Optimiser Adam Loss function Cross-entropy |