Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Minimizing False-Positive Attributions in Explanations of Non-Linear Models

Authors: Anders Gjølbye, Stefan Haufe, Lars Kai Hansen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Pattern Local on the XAI-TRIS benchmark, artificial lesion MRI benchmark, and an EEG Motor imagery dataset, and compare it with a range of established XAI methods.
Researcher Affiliation	Academia	Anders Gjølbye1 Stefan Haufe2,3,4 Lars Kai Hansen1 1Technical University of Denmark 2Technische Universität Berlin 3Physikalisch-Technische Bundesanstalt, Berlin 4Charité Universitätsmedizin Berlin EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes the Pattern Local method and its formal objective using mathematical equations and textual descriptions, but it does not present structured pseudocode or an algorithm block.
Open Source Code	Yes	Code is available at https://github.com/gjoelbye/PatternLocal.
Open Datasets	Yes	We evaluate Pattern Local on the XAI-TRIS benchmark, artificial lesion MRI benchmark, and an EEG Motor imagery dataset, and compare it with a range of established XAI methods.
Dataset Splits	Yes	For training and evaluation, each dataset is split into Dtrain, Dval, and Dtest in a 90/5/5 ratio.
Hardware Specification	Yes	All experiments were executed on a local high-performance computing (HPC) cluster equipped with Intel Xeon E5-2650 v4 CPUs (12 cores, 24 threads, 2.20 GHz) and 256 GB RAM per node. No dedicated GPUs were required. Jobs were managed with SLURM 22.05 and ran under Alma Linux 9.5.
Software Dependencies	Yes	The codebase is primarily written in Python 3.13.0. Key libraries are: Num Py 2.1.3, Py Torch 2.6 for model definition, Py Torch-Lightning 2.5 for model training and evaluation, scikit-learn 1.6.1 for classical baselines and metrics, hyperopt 0.2.7 for Bayesian optimization, POT 0.9.5 for Earth-Mover-Distance evaluation, Hydra 1.3.2 for experiment handling.
Experiment Setup	Yes	Table 2: Hyperparameters used for model training. Hyperparameter Value Initial learning rate 1 × 10−4 Batch size 128 LR-scheduler factor 0.1 Patience (LR + early stop) 100 epochs Maximum training epochs 500 Optimiser Adam Loss function Cross-entropy