Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

Authors: Liwen Sun, Abhineet Agarwal, Aaron Kornblith, Bin Yu, Chenyan Xiong

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on MIMIC-ED-Assist show that ED-Copilot outperforms state-of-the-art tree models while halving time-costs of laboratory testing from four hours to two hours. Our analyses also confirm the benefit of ED-Copilot s personalized modeling approach. In Section 2, we review related work. We discuss MIMIC-ED-Assist and ED-Copilot in Sections 3 and 4 respectively. Sections 5 and 6 discuss our experimental set-up and results.
Researcher Affiliation Academia 1Language Technologies Institute, Carnegie Mellon University 2Department of Statistics, University of California, Berkeley 3Department of Emergency Medicine & Pediatrics, University of California, San Francisco 4Department of EECS, University of California, Berkeley. Correspondence to: Chenyan Xiong <EMAIL>.
Pseudocode Yes Algorithm 1 Proximal Policy Optimization (PPO)
Open Source Code Yes Our code is available at https: //github.com/cxcscmu/ED-Copilot.
Open Datasets Yes In collaboration with ED clinicians, we use public patient data to curate MIMIC-EDAssist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. MIMIC-ED-Assist is derived from MIMIC-IV (Johnson et al., 2023b) and related datasets (Xie et al., 2022). Our pipeline to create MIMIC-ED-Assist from the MIMIC-IV dataset can be found at https: //github.com/cxcscmu/ED-Copilot. After completing a training course and signing a data use agreement regarding patient information privacy, individuals will gain access to MIMIC-IV and can utilize our pipeline to create MIMIC-ED-Assist.
Dataset Splits Yes We randomly split the dataset using 80% for training, 10% for validation, and 10% for testing, while ensuring each split has the same class distribution. The validation set is used to tune hyper-parameters.
Hardware Specification Yes All experiments, training and hyper-parameter tuning are conducted on one NVIDIA RTX A6000 GPU.
Software Dependencies No The paper mentions software like "Bio GPT (Luo et al., 2022)" and "stable-baseline3 (Raffin et al., 2021)" but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes We list the hyper-parameters in Table 8, including the supervised fine-tuning and reinforcement learning stage. In the RL stage, we use grid-search to tune α and β to balance the trade-off between accuracy and cost. The search scope for α { 1 / 2, 1, 2, 4, 8, 15, 16, 32, 64, 256} and β { 1 / 10, 1, 10, 100, 1000}.