Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies
Authors: Runze Yan, Xun Shen, Akifumi Wachi, Sebastien Gros, Anni Zhao, Xiao Hu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate OGSRL through comprehensive experiments on real-world clinical data to validate three key aspects of our framework: (1) the effectiveness of the OOD guardian in constraining policies to in-distribution regions, (2) the ability to learn safe and effective treatment policies that improve upon clinician behavior while satisfying physiological safety constraints, and (3) the generalizability across different critical care conditions. We conduct detailed evaluation on sepsis treatment using the MIMICIII dataset (Sections 4.1 4.2). |
| Researcher Affiliation | Collaboration | 1Emory University, 2Tokyo University of Agriculture and Technology, 3LY Corporation, 4Norwegian University of Science and Technology |
| Pseudocode | Yes | Algorithm 1 OGSRL: Offline Guarded Safe Reinforcement Learning for Treatment Recommendation |
| Open Source Code | Yes | Our source code is available at https://github.com/Runz96/Safe RL-OGSRL. |
| Open Datasets | Yes | When evaluated on the MIMIC-III sepsis treatment dataset, OGSRL demonstrated significantly better OOD handling than baselines. OGSRL achieved a 78% reduction in mortality estimates and a 51% increase in reward compared to clinician decisions. ... We evaluated OGSRL using 18,923 ICU stays with sepsis diagnosis from the MIMIC-III dataset 3 [18] ...3MIMIC-III dataset: https://physionet.org/content/mimiciii/1.4/. ...To evaluate whether OGSRL generalizes beyond sepsis management, we validate our framework on the Synthetic Acute Hypotension Dataset [27]. |
| Dataset Splits | Yes | We implemented a five-fold cross-validation approach, randomly dividing the data into training (60%), validation (20%), and test (20%) partitions for each seed. |
| Hardware Specification | Yes | All experiments were conducted on a high performance computing (HPC) cluster equipped with NVIDIA A100 and V100 GPUs. |
| Software Dependencies | No | The paper mentions several algorithms and models (e.g., CPO [1], GPR [48, 49], KDE [16], k-NN [34, 51]) but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Our implementation addresses key limitations in previous approaches to sepsis treatment optimization. Rather than discretizing interventions or combining multiple treatments into a single dimension, we developed a continuous two-dimensional action space that separately models intravenous fluid administration (IFA) and maximum vasopressor dosage (MVD), namely a = [IFA, MVD] R2. This representation enables more nuanced treatment recommendations, reflecting the clinical reality where physicians simultaneously titrate multiple interventions based on patient response. The state representation emerged from a clinically informed feature selection process, incorporating variables significantly correlated with organ dysfunction. This balanced representation captures essential physiological dynamics while enabling personalized treatment strategies. Totally 13 features are selected as the dynamic state, namely, s R13. Departing from previous work that employed mortality as a terminal reward [22], we adapted the Sequential Organ Failure Assessment (SOFA) score into an instantaneous reward signal by setting r : S A 1 SOFA. ... Our approach implements two distinct but complementary safety mechanisms. First, explicit safety constraints are appllied to physiological states by enforcing minimum physiological thresholds for oxygen saturation (Sp O2) ( 92%) [38] and urine output ( 0.5 m L/kg/hour) [20]. |