Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Practical do-Shapley Explanations with Estimand-Agnostic Causal Inference

Authors: Álvaro Parafita, Tomas Garriga, Axel Brando, Francisco Cazorla

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section contains the empirical validation of our approach. We begin with a synthetic DGP, from which we can derive ground truth do-SVs, to measure estimation error on several EA methods. Secondly, we demonstrate the speedup resulting from the FRA-cache with an ablation test. Finally, we showcase do-SHAP explanations on two real-world datasets to illustrate the power of do-SHAP explanations (section 5 and appendix F).
Researcher Affiliation	Collaboration	1Barcelona Supercomputing Center 2Novartis EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Frontier-Reducibility Algorithm (FRA) set version
Open Source Code	Yes	Please refer to the Supplementary Material for the code of these experiments.
Open Datasets	Yes	Here we discuss the Diabetes Health Indicators Dataset [43]... [43] CDC. CDC Diabetes Health Indicators. UCI Machine Learning Repository, 2015. Preprocessed dataset downloaded from DOI: https://doi.org/10.24432/C53919. We now study the Bike Rental Dataset [44], describing the number of rentals... [44] Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2:113 127, 2014.
Dataset Splits	Yes	We generate 1,000 i. i. d. samples from this DGP to create training, validation and test sets with ratios 8:1:1.
Hardware Specification	Yes	These experiments are executed on personal computers (particularly, a Macbook with an M3 Pro chip) and do not require an infrastructure of workers for their execution.
Software Dependencies	No	The paper mentions specific software components like "Adam W optimizer" and "Rational-Quadratic Spline Flow" but does not provide specific version numbers for these components or other key libraries (e.g., Python, PyTorch versions).
Experiment Setup	Yes	In terms of training, we use the Adam W optimizer [42] with Early Stopping (after 100 epochs with no improvement), using learning rate 10 3, weight decay 10 2 and batch size 100. Regarding SHAP estimation, since we only have 5 variables, we use the exact permutation method, taking 1,000 samples from each SCM for the Monte Carlo estimators of ν(S).