reproducibilityindex.ai

Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Authors: Ho Chit Siu, Jaime Peña, Edenna Chen, Yutai Zhou, Victor Lopez, Kyle Palko, Kimberlee Chang, Ross Allen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human s perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate.
Researcher Affiliation	Collaboration	MIT Lincoln Laboratory, {hochit.siu,jdpena,yutai.zhou,chestnut,ross.allen}@ll.mit.edu MIT Department of Electrical Engineering and Computer Science, edenna@mit.edu U.S. Air Force Artiﬁcial Intelligence Accelerator, {victor.lopez.10,kyle.palko.1}@us.af.mil
Pseudocode	No	The paper describes its methodology through narrative text and statistical analysis sections, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states that the AI agents used (Other-Play and Smart Bot) have MIT Licenses and references their original sources ([33] and [21]), but does not explicitly state that the authors are releasing their own code for the experimental setup, data collection, or statistical analysis described in this paper.
Open Datasets	No	The paper describes conducting human-AI experiments in the cooperative card game Hanabi and collecting objective game scores and subjective Likert scale survey responses from participants. It does not use a pre-existing public dataset that would require access information.
Dataset Splits	No	The paper describes human-AI teaming experiments where participants played multiple games, but it does not involve training, validation, or test dataset splits in the context of model development or evaluation, as the data is collected directly from human participant interactions.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance specifications) used to run its experiments.
Software Dependencies	No	The paper mentions the use of specific AI agents (Other-Play, Smart Bot) and a Hanabi game interface, but it does not provide specific version numbers for these or any other ancillary software dependencies required to reproduce the experiment.
Experiment Setup	No	The paper describes the procedure for human participation in the experiment, including the number of games played and survey details. However, it does not provide specific technical experimental setup details such as hyperparameters, model initialization, or training schedules, as it uses pre-existing AI agents.