Auditing for Human Expertise
Authors: Rohan Alur, Loren Laine, Darrick Li, Manish Raghavan, Devavrat Shah, Dennis Shung
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our test to evaluate whether emergency room physicians incorporate valuable information which is not available to a common algorithmic risk score for patients with acute gastrointestinal bleeding (AGIB). To that end, we utilize patient admissions data collected from the emergency department of a large academic hospital system. Consistent with prior literature, we find that this algorithmic score is an exceptionally sensitive measure of patient risk and one that is highly competitive with physicians expert assessments. Nonetheless, our test provides strong evidence that physician decisions to either hospitalize or discharge patients with AGIB are incorporating valuable information that is not captured by the screening tool. |
| Researcher Affiliation | Academia | Rohan Alur EECS MIT ralur@mit.edu Loren Laine School of Medicine Yale loren.laine@yale.edu Darrick K. Li School of Medicine Yale darrick.li@yale.edu Manish Raghavan Sloan, EECS MIT mragh@mit.edu Devavrat Shah EECS, IDSS, LIDS, SDSC MIT devavrat@mit.edu Dennis Shung School of Medicine Yale dennis.shung@yale.edu |
| Pseudocode | Yes | G Pseudocode for Expert Test In this section we provide pseudocode for Expert Test. Inputs D0, L, K, α, F( ), m( , ) are as defined in Section 3. |
| Open Source Code | Yes | Code, data and instructions to replicate our experiments are available at https://github.com/ralur/auditinghuman-expertise. Publication of the results and data associated with the empirical study in section 5 have been approved by the relevant institutional review board (IRB). |
| Open Datasets | No | No explicit mention of a publicly available dataset with concrete access information (link, DOI, formal citation) was found. The paper states, 'We consider a sample of 3617 patients who presented with AGIB at one of three hospitals in a large academic health system between 2014 and 2018.' |
| Dataset Splits | No | No explicit details on dataset splits (e.g., specific percentages or sample counts for training, validation, or test sets, or references to predefined splits) were provided in the paper's main text. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, memory, or cloud instances) used for running experiments were mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers were listed in the paper. |
| Experiment Setup | No | No specific experimental setup details (e.g., hyperparameter values like learning rate, batch size, or specific optimizer settings) were explicitly provided in the main text. |