Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Authors: Ho Chit Siu, Jaime Peña, Edenna Chen, Yutai Zhou, Victor Lopez, Kyle Palko, Kimberlee Chang, Ross Allen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human s perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate.
Researcher Affiliation Collaboration MIT Lincoln Laboratory, {hochit.siu,jdpena,yutai.zhou,chestnut,ross.allen}@ll.mit.edu MIT Department of Electrical Engineering and Computer Science, edenna@mit.edu U.S. Air Force Artificial Intelligence Accelerator, {victor.lopez.10,kyle.palko.1}@us.af.mil
Pseudocode No The paper describes its methodology through narrative text and statistical analysis sections, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states that the AI agents used (Other-Play and Smart Bot) have MIT Licenses and references their original sources ([33] and [21]), but does not explicitly state that the authors are releasing their own code for the experimental setup, data collection, or statistical analysis described in this paper.
Open Datasets No The paper describes conducting human-AI experiments in the cooperative card game Hanabi and collecting objective game scores and subjective Likert scale survey responses from participants. It does not use a pre-existing public dataset that would require access information.
Dataset Splits No The paper describes human-AI teaming experiments where participants played multiple games, but it does not involve training, validation, or test dataset splits in the context of model development or evaluation, as the data is collected directly from human participant interactions.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance specifications) used to run its experiments.
Software Dependencies No The paper mentions the use of specific AI agents (Other-Play, Smart Bot) and a Hanabi game interface, but it does not provide specific version numbers for these or any other ancillary software dependencies required to reproduce the experiment.
Experiment Setup No The paper describes the procedure for human participation in the experiment, including the number of games played and survey details. However, it does not provide specific technical experimental setup details such as hyperparameters, model initialization, or training schedules, as it uses pre-existing AI agents.