Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Path-specific effects for pulse-oximetry guided decisions in critical care

Authors: Kevin Zhang, Yonghan Jung, Divyat Mahajan, Karthikeyan Shanmugam, Shalmali Joshi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our methodology is validated on semi-synthetic data and applied to two large real-world health datasets: MIMIC-IV and e ICU.
Researcher Affiliation	Collaboration	Kevin Zhang , Columbia University New York, USA; Yonghan Jung University of Illinois Urbana-Champaign Champaign, USA; Divyat Mahajan Mila & Université de Montréal Montreal, CA; Karthikeyan Shanmugam Google Deep Mind Bengaluru, IN; Shalmali Joshi Columbia University New York, USA
Pseudocode	No	The paper describes methods and concepts but does not include any explicitly labeled pseudocode or algorithm blocks. It references 'algorithmic fairness' but doesn't present its own algorithm in pseudocode.
Open Source Code	Yes	All the code and instructions for reproducing our experiments are available at https://github.com/re AIM-Lab/PSE-Pulse-Oximetry.
Open Datasets	Yes	To conduct semi-synthetic/real-world experiments, we use two large, publicly available critical care datasets: the e ICU Collaborative Research Database (e ICU) with ICU admissions from hospitals across the continental U.S. [40], and Medical Information Mart for Intensive Care (MIMIC-IV), with ICU data from the Beth Israel Deaconess Medical Center in Boston [22].
Dataset Splits	Yes	For all experiments, we perform sample-splitting with L = 5 folds and clip propensities within the range [ε, 1 ε], where ε = 10 4.
Hardware Specification	Yes	All models were trained on a single NVIDIA RTX A6000 GPU, 32 CPU cores, 256GB of system RAM, running Ubuntu 22.04 with kernel 5.15.
Software Dependencies	No	The paper mentions 'running Ubuntu 22.04 with kernel 5.15' and 'XGBoost', but does not provide specific version numbers for key software libraries or dependencies within the main text or appendices.
Experiment Setup	Yes	The grid search optimizes three hyperparameters: the number of estimators from the set {20, 50, 100, 200}, the maximum tree depth from the set {3, 4, 5, 6}, and the ℓ2-regularization penalty from the set {0.5, 1, 2, 5}, using 5-fold cross-validation.