Does Explainable Artificial Intelligence Improve Human Decision-Making?

Authors: Yasmeen Alufaisan, Laura R. Marusich, Jonathan Z. Bakdash, Yan Zhou, Murat Kantarcioglu6618-6626

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using real datasets, we compare objective human decision accuracy without AI (control), with an AI prediction (no explanation), and AI prediction with explanation. We find providing any kind of AI prediction tends to improve user decision accuracy, but no conclusive evidence that explainable AI has a meaningful impact.
Researcher Affiliation Collaboration Yasmeen Alufaisan,1 Laura R. Marusich, 2 Jonathan Z. Bakdash,3 Yan Zhou,4 Murat Kantarcioglu 4 1 EXPEC Computer Center at Saudi Aramco Dhahran 31311, Saudi Arabia 2 U.S. Army Combat Capabilities Development Command Army Research Laboratory South at the University of Texas at Arlington 3 U.S. Army Combat Capabilities Development Command Army Research Laboratory South at the University of Texas at Dallas 4 University of Texas at Dallas Richardson, TX 75080
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide explicit statements or links to open-source code for the methodology or analysis described in this paper.
Open Datasets Yes COMPAS stands for Correctional Offender Management Profiling for Alternative Sanctions (Angwin et al. 2016). It is a scoring system used to assign risk scores to criminal defendants to determine their likelihood of becoming a recidivist. The data has 6,479 instances and 7 features. [...] Census income (CI) data contains information used to predict individuals income (Dua and Graff 2017). It has 32,561 instances and 14 features.
Dataset Splits No The paper mentions 'We split the data to 60% for training and 40% for testing to allow enough instances for the explanations generated using anchor LIME', but does not specify a separate validation split.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies Yes We developed the experiment using js Psych (De Leeuw 2015), and hosted it on the Volunteer Science platform (Radford et al. 2016).
Experiment Setup Yes We compared the prediction accuracy of Logistic Regression, Multi-layer Perceptron Neural Network with two layers of 50 units each, Random Forest, Support Vector Machine (SVM) with rbf kernel and selected the best classifier for each dataset. We chose a Multi-layer Perceptron Neural Network for Census income data where it resulted in an overall accuracy of 82% and SVM with rbf kernel for COMPAS data with an overall accuracy of 68%. We split the data to 60% for training and 40% for testing...