Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death

Authors: Sihyung Park, Wenbin Lu, Shu Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical validation and an application to electronic health records showcase its utility for personalized treatment optimization. [...] Section 5 demonstrates multiply robust off-policy learning using the proposed estimator. Section 6 demonstrates multiple robustness to nuisance model misspecification. We show MR estimator can facilitate decision-making for high-risk patients group by applying it to MIMIC-III database (Section 7).
Researcher Affiliation Academia Sihyung Park Department of Statistics North Carolina State University Raleigh, NC 27695 EMAIL Wenbin Lu Department of Statistics North Carolina State University Raleigh, NC 27695 EMAIL Shu Yang Department of Statistics North Carolina State University Raleigh, NC 27695 EMAIL
Pseudocode Yes Algorithm 1 Compute VMR(π) via cross-fitting.
Open Source Code Yes Justification: We have made the code used to generate our results available. The zip file contains a YAML file that can reproduce the same conda environment we used.
Open Datasets Yes To illustrate the utility of our proposed methodology, we applied it to the Medical Information Mart for Intensive Care III (MIMIC-III) v1.4 database. MIMIC-III is a publicly accessible, MIT-licensed database containing de-identified health records from over 40,000 patients admitted to critical care units at Beth Israel Deaconess Medical Center between 2001 and 2012. [...] Johnson et al. (2016) provides a detailed description.
Dataset Splits Yes In each iteration, stratified sampling on censoring (C1, C2) and survival (S1, S2) indicators was used to create balanced training and test sets. Policies were learned on training data and their value estimated on test data. [...] We used empirical version of this formula with true nuisance models and an independently generated large sample of size 100,000 to compute PCD-AS.
Hardware Specification Yes The off-policy learning simulation ran on an internal cluster, with each iteration on a single core, 8 GB RAM instance. Other experiments and the MIMIC-III application used a CPU machine with 16 GB RAM.
Software Dependencies Yes Justification: We have made the code used to generate our results available. The zip file contains a YAML file that can reproduce the same conda environment we used.
Experiment Setup Yes We employed logistic regression models for estimating the propensity score, censoring and survival probability. For continuous outcome models, we fitted random forest regressors. Lastly, generalized additive models were fitted to estimate the conditional mean functions, mp2 and mµ2. We used a differential evolution algorithm to optimize within the class of linear policies.