Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

“Other-Play” for Zero-Shot Coordination

Authors: Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.6. Experiments We evaluate OP in two different settings.
Researcher Affiliation	Industry	1Facebook AI Research, USA.
Pseudocode	No	The paper describes the 'other-play' algorithm and its implementation but does not present any formal pseudocode blocks or figures.
Open Source Code	Yes	The code is available as a notebook online here and can be executed online without downloading: https://bit.ly/2vYkfI7.
Open Datasets	Yes	We construct agents for the cooperative card game Hanabi, which has recently been established a benchmark environment for multi-agent decision making in partially observable settings (Bard et al., 2020). We recruited 20 individuals from a board game club... using the user interface open-sourced by (Lerer et al., 2019).
Dataset Splits	No	The paper mentions training agents and evaluating them through cross-play, but it does not provide specific training, validation, and test dataset splits or percentages required for reproduction.
Hardware Specification	No	The paper states: “First, we use 2 GPUs for simulation instead of 1 as in the original paper.” This is a generic mention of GPUs without specific model numbers or other hardware details.
Software Dependencies	No	The paper mentions using the 'Simpliﬁed Action Decoder (SAD)' and 'deep reinforcement learning (deep RL) based methods' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We use the open-sourced implementation of SAD as well as most of its hyper-parameters but with two major modiﬁcations. First, we use 2 GPUs for simulation instead of 1 as in the original paper... Second, we introduce extra hyper-parameters that control the network architecture to add diversity to the model capacity in order to better demonstrate the effectiveness of OP. Speciﬁcally, the network can have either 1 or 2 fully connected layers before 2 LSTM layers and can have an optional residual connection to by-pass the LSTM layers... We train agents with the aforementioned 4 different network architectures. We run each hyper-parameter conﬁguration with 3 different seeds...