Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AI Reliance and Decision Quality: Fundamentals, Interdependence, and the Effects of Interventions

Authors: Jakob Schoeffer, Johannes Jakubik, Michael Vössing, Niklas Kühl, Gerhard Satzger

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To further emphasize its practical relevance, we now apply our framework to three empirical studies that investigate the effects of different interventions on reliance behavior and accuracy in AI-assisted decision-making. The studies cover both binary and multi-class decision-making tasks. We propose that the way we document and present the empirical findings can serve as a blueprint for researchers when reporting their own results. ... We apply our proposed framework to these studies to visualize the effects of the respective interventions and to gain a deeper understanding of how the interventions affect the ability of humans to discern between correct and wrong AI recommendations. To this end, we implemented a user interface that allows to customize the visual framework to any empirical study requiring only information on the AI accuracy (Acc AI), the level of adherence (A), and the final decision-making accuracy (Accfinal).
Researcher Affiliation Academia Jakob Schoeffer EMAIL University of Groningen Groningen, The Netherlands Johannes Jakubik EMAIL Karlsruhe Institute of Technology Karlsruhe, Germany Michael Vossing EMAIL Karlsruhe Institute of Technology Karlsruhe, Germany Niklas Kuhl EMAIL University of Bayreuth Fraunhofer FIT Bayreuth, Germany Gerhard Satzger EMAIL Karlsruhe Institute of Technology Karlsruhe, Germany
Pseudocode No The paper describes mathematical propositions (e.g., Proposition 1, Proposition 3, Proposition 4) and equations (e.g., Fβ score, Q metric), but it does not include any sections explicitly labeled as 'Pseudocode' or 'Algorithm', nor any structured code-like procedures.
Open Source Code Yes To support this, we also offer an open-source tool built on our framework, which is available at https://github.com/jhnnsjkbk/accuracy-reliance.
Open Datasets No The paper states it uses data from previous studies: 'As we have access to the raw data for study 2, we directly use this data to determine accuracy and adherence values. Similarly, for study 3, we are able to directly infer the aggregated values for AI accuracy, adherence, and the final decision-making accuracy across the three conditions weak AI, medium AI, and strong AI from the publicly available raw data.' It also mentions 'the ICPSR dataset' for study 1. However, the paper does not provide concrete access information (links, DOIs, specific repositories) for these datasets within its own text.
Dataset Splits No The paper analyzes empirical findings from prior studies and refers to their data. While it mentions conditions like 'in-distribution and out-of-distribution setups' from study 1, it does not provide specific details on how to perform dataset splits (e.g., exact percentages, sample counts, or methodology) for reproducing its analysis. It uses existing data or aggregated values from previous works.
Hardware Specification No The paper focuses on developing and applying a theoretical framework to interpret existing empirical studies and does not describe any new experiments with specific hardware requirements. There is no mention of specific CPU, GPU, or other hardware used for running any experiments or computations in this paper.
Software Dependencies No The paper states, 'To this end, we implemented a user interface that allows to customize the visual framework to any empirical study...' and provides a GitHub link for the tool. However, it does not specify any particular software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) that would be needed to reproduce the implementation.
Experiment Setup No The paper develops a conceptual framework and applies it to interpret existing empirical findings. It does not describe any new experimental setups, such as hyperparameter values, model initialization, or training schedules, as it does not conduct its own machine learning experiments.