Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Practical do-Shapley Explanations with Estimand-Agnostic Causal Inference

Authors: รlvaro Parafita, Tomas Garriga, Axel Brando, Francisco Cazorla

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section contains the empirical validation of our approach. We begin with a synthetic DGP, from which we can derive ground truth do-SVs, to measure estimation error on several EA methods. Secondly, we demonstrate the speedup resulting from the FRA-cache with an ablation test. Finally, we showcase do-SHAP explanations on two real-world datasets to illustrate the power of do-SHAP explanations (section 5 and appendix F).
Researcher Affiliation Collaboration 1Barcelona Supercomputing Center 2Novartis EMAIL EMAIL
Pseudocode Yes Algorithm 1 Frontier-Reducibility Algorithm (FRA) set version
Open Source Code Yes Please refer to the Supplementary Material for the code of these experiments.
Open Datasets Yes Here we discuss the Diabetes Health Indicators Dataset [43]... [43] CDC. CDC Diabetes Health Indicators. UCI Machine Learning Repository, 2015. Preprocessed dataset downloaded from DOI: https://doi.org/10.24432/C53919. We now study the Bike Rental Dataset [44], describing the number of rentals... [44] Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2:113 127, 2014.
Dataset Splits Yes We generate 1,000 i. i. d. samples from this DGP to create training, validation and test sets with ratios 8:1:1.
Hardware Specification Yes These experiments are executed on personal computers (particularly, a Macbook with an M3 Pro chip) and do not require an infrastructure of workers for their execution.
Software Dependencies No The paper mentions specific software components like "Adam W optimizer" and "Rational-Quadratic Spline Flow" but does not provide specific version numbers for these components or other key libraries (e.g., Python, PyTorch versions).
Experiment Setup Yes In terms of training, we use the Adam W optimizer [42] with Early Stopping (after 100 epochs with no improvement), using learning rate 10 3, weight decay 10 2 and batch size 100. Regarding SHAP estimation, since we only have 5 variables, we use the exact permutation method, taking 1,000 samples from each SCM for the Monte Carlo estimators of ฮฝ(S).