Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi
Authors: Ho Chit Siu, Jaime Peña, Edenna Chen, Yutai Zhou, Victor Lopez, Kyle Palko, Kimberlee Chang, Ross Allen
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human s perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. |
| Researcher Affiliation | Collaboration | MIT Lincoln Laboratory, {hochit.siu,jdpena,yutai.zhou,chestnut,ross.allen}@ll.mit.edu MIT Department of Electrical Engineering and Computer Science, edenna@mit.edu U.S. Air Force Artificial Intelligence Accelerator, {victor.lopez.10,kyle.palko.1}@us.af.mil |
| Pseudocode | No | The paper describes its methodology through narrative text and statistical analysis sections, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that the AI agents used (Other-Play and Smart Bot) have MIT Licenses and references their original sources ([33] and [21]), but does not explicitly state that the authors are releasing their own code for the experimental setup, data collection, or statistical analysis described in this paper. |
| Open Datasets | No | The paper describes conducting human-AI experiments in the cooperative card game Hanabi and collecting objective game scores and subjective Likert scale survey responses from participants. It does not use a pre-existing public dataset that would require access information. |
| Dataset Splits | No | The paper describes human-AI teaming experiments where participants played multiple games, but it does not involve training, validation, or test dataset splits in the context of model development or evaluation, as the data is collected directly from human participant interactions. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance specifications) used to run its experiments. |
| Software Dependencies | No | The paper mentions the use of specific AI agents (Other-Play, Smart Bot) and a Hanabi game interface, but it does not provide specific version numbers for these or any other ancillary software dependencies required to reproduce the experiment. |
| Experiment Setup | No | The paper describes the procedure for human participation in the experiment, including the number of games played and survey details. However, it does not provide specific technical experimental setup details such as hyperparameters, model initialization, or training schedules, as it uses pre-existing AI agents. |