Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies

Authors: Runze Yan, Xun Shen, Akifumi Wachi, Sebastien Gros, Anni Zhao, Xiao Hu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate OGSRL through comprehensive experiments on real-world clinical data to validate three key aspects of our framework: (1) the effectiveness of the OOD guardian in constraining policies to in-distribution regions, (2) the ability to learn safe and effective treatment policies that improve upon clinician behavior while satisfying physiological safety constraints, and (3) the generalizability across different critical care conditions. We conduct detailed evaluation on sepsis treatment using the MIMICIII dataset (Sections 4.1 4.2).
Researcher Affiliation Collaboration 1Emory University, 2Tokyo University of Agriculture and Technology, 3LY Corporation, 4Norwegian University of Science and Technology
Pseudocode Yes Algorithm 1 OGSRL: Offline Guarded Safe Reinforcement Learning for Treatment Recommendation
Open Source Code Yes Our source code is available at https://github.com/Runz96/Safe RL-OGSRL.
Open Datasets Yes When evaluated on the MIMIC-III sepsis treatment dataset, OGSRL demonstrated significantly better OOD handling than baselines. OGSRL achieved a 78% reduction in mortality estimates and a 51% increase in reward compared to clinician decisions. ... We evaluated OGSRL using 18,923 ICU stays with sepsis diagnosis from the MIMIC-III dataset 3 [18] ...3MIMIC-III dataset: https://physionet.org/content/mimiciii/1.4/. ...To evaluate whether OGSRL generalizes beyond sepsis management, we validate our framework on the Synthetic Acute Hypotension Dataset [27].
Dataset Splits Yes We implemented a five-fold cross-validation approach, randomly dividing the data into training (60%), validation (20%), and test (20%) partitions for each seed.
Hardware Specification Yes All experiments were conducted on a high performance computing (HPC) cluster equipped with NVIDIA A100 and V100 GPUs.
Software Dependencies No The paper mentions several algorithms and models (e.g., CPO [1], GPR [48, 49], KDE [16], k-NN [34, 51]) but does not provide specific version numbers for these software components.
Experiment Setup Yes Our implementation addresses key limitations in previous approaches to sepsis treatment optimization. Rather than discretizing interventions or combining multiple treatments into a single dimension, we developed a continuous two-dimensional action space that separately models intravenous fluid administration (IFA) and maximum vasopressor dosage (MVD), namely a = [IFA, MVD] R2. This representation enables more nuanced treatment recommendations, reflecting the clinical reality where physicians simultaneously titrate multiple interventions based on patient response. The state representation emerged from a clinically informed feature selection process, incorporating variables significantly correlated with organ dysfunction. This balanced representation captures essential physiological dynamics while enabling personalized treatment strategies. Totally 13 features are selected as the dynamic state, namely, s R13. Departing from previous work that employed mortality as a terminal reward [22], we adapted the Sequential Organ Failure Assessment (SOFA) score into an instantaneous reward signal by setting r : S A 1 SOFA. ... Our approach implements two distinct but complementary safety mechanisms. First, explicit safety constraints are appllied to physiological states by enforcing minimum physiological thresholds for oxygen saturation (Sp O2) ( 92%) [38] and urine output ( 0.5 m L/kg/hour) [20].