ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance
Authors: Liwen Sun, Abhineet Agarwal, Aaron Kornblith, Bin Yu, Chenyan Xiong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on MIMIC-ED-Assist show that ED-Copilot outperforms state-of-the-art tree models while halving time-costs of laboratory testing from four hours to two hours. Our analyses also confirm the benefit of ED-Copilot s personalized modeling approach. In Section 2, we review related work. We discuss MIMIC-ED-Assist and ED-Copilot in Sections 3 and 4 respectively. Sections 5 and 6 discuss our experimental set-up and results. |
| Researcher Affiliation | Academia | 1Language Technologies Institute, Carnegie Mellon University 2Department of Statistics, University of California, Berkeley 3Department of Emergency Medicine & Pediatrics, University of California, San Francisco 4Department of EECS, University of California, Berkeley. Correspondence to: Chenyan Xiong <cx@cs.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Proximal Policy Optimization (PPO) |
| Open Source Code | Yes | Our code is available at https: //github.com/cxcscmu/ED-Copilot. |
| Open Datasets | Yes | In collaboration with ED clinicians, we use public patient data to curate MIMIC-EDAssist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. MIMIC-ED-Assist is derived from MIMIC-IV (Johnson et al., 2023b) and related datasets (Xie et al., 2022). Our pipeline to create MIMIC-ED-Assist from the MIMIC-IV dataset can be found at https: //github.com/cxcscmu/ED-Copilot. After completing a training course and signing a data use agreement regarding patient information privacy, individuals will gain access to MIMIC-IV and can utilize our pipeline to create MIMIC-ED-Assist. |
| Dataset Splits | Yes | We randomly split the dataset using 80% for training, 10% for validation, and 10% for testing, while ensuring each split has the same class distribution. The validation set is used to tune hyper-parameters. |
| Hardware Specification | Yes | All experiments, training and hyper-parameter tuning are conducted on one NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions software like "Bio GPT (Luo et al., 2022)" and "stable-baseline3 (Raffin et al., 2021)" but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | We list the hyper-parameters in Table 8, including the supervised fine-tuning and reinforcement learning stage. In the RL stage, we use grid-search to tune α and β to balance the trade-off between accuracy and cost. The search scope for α { 1 / 2, 1, 2, 4, 8, 15, 16, 32, 64, 256} and β { 1 / 10, 1, 10, 100, 1000}. |