Using AI Uncertainty Quantification to Improve Human Decision-Making
Authors: Laura Marusich, Jonathan Bakdash, Yan Zhou, Murat Kantarcioglu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the impact on human decision-making for instance-level UQ, calibrated using a strict scoring rule, in two online behavioral experiments. In the first experiment, our results showed that UQ was beneficial for decision-making performance compared to only AI predictions. In the second experiment, we found UQ had generalizable benefits for decisionmaking across a variety of representations for probabilistic information. These results indicate that implementing high quality, instance-level UQ for AI may improve decision-making with real systems compared to AI predictions alone. |
| Researcher Affiliation | Collaboration | 1DEVCOM Army Research Laboratory 2University of Texas at Dallas, Richardson, TX. |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm," nor does it present structured code-like steps for a procedure. |
| Open Source Code | Yes | See supplementary material at https://osf.io/cb762/. |
| Open Datasets | Yes | We assessed our research questions using three different publicly-available and widely-used datasets: the Census, German Credit, and Student Performance datasets from the UCI Machine Learning Repository (Dua & Graff, 2017), described in more detail below. |
| Dataset Splits | No | The paper states: "Each dataset was split into training (70%) and test (30%) data sets." While it mentions train and test splits, it does not explicitly state a validation set split or provide details for one. |
| Hardware Specification | Yes | All classification tasks were completed on an Intel Xeon machine with a 2.30GHz CPU. |
| Software Dependencies | No | The paper mentions "jsPysch (De Leeuw, 2015)" and "Just Another Tool for Online Studies (JATOS) https://github.com/JATOS/JATOS" but does not specify their version numbers. It also mentions machine learning models (e.g., random forest) but not the specific libraries and their versions used for implementation. |
| Experiment Setup | Yes | In our study, we aim to provide predictive uncertainty quantification to human decision-makers and use the advantage of knowing the true labels in advance. Therefore, we simplify the problem as sampling predictive confidence from samples of x with a small random disturbance and verify the quality of the uncertainty estimate using a strictly proper scoring rule (Gneiting & Raftery, 2007) before showing it to the human. ... In this study, we set n = 100 and σ0 = 0.1. ... In the experiment, we let n = 100 and δ = 0.1 which provided sufficient statistical significance and constrained neighborhood choices. ... Each trial of this task included a description of an individual and a two-alternative forced choice for the classification of that individual. Each choice was correct on 50% of the trials, thus chance performance for human decision-making accuracy was 50%. ... After making a decision, participants then entered their confidence in that choice, on a Likert scale of 1 (No Confidence) to 5 (Full Confidence). ... for each participant, we randomly sampled 40 of those 50 instances for the block of test trials... completed 8 practice trials, followed by 40 test trials. |