“Other-Play” for Zero-Shot Coordination
Authors: Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.6. Experiments We evaluate OP in two different settings. |
| Researcher Affiliation | Industry | 1Facebook AI Research, USA. |
| Pseudocode | No | The paper describes the 'other-play' algorithm and its implementation but does not present any formal pseudocode blocks or figures. |
| Open Source Code | Yes | The code is available as a notebook online here and can be executed online without downloading: https://bit.ly/2vYkfI7. |
| Open Datasets | Yes | We construct agents for the cooperative card game Hanabi, which has recently been established a benchmark environment for multi-agent decision making in partially observable settings (Bard et al., 2020). We recruited 20 individuals from a board game club... using the user interface open-sourced by (Lerer et al., 2019). |
| Dataset Splits | No | The paper mentions training agents and evaluating them through cross-play, but it does not provide specific training, validation, and test dataset splits or percentages required for reproduction. |
| Hardware Specification | No | The paper states: “First, we use 2 GPUs for simulation instead of 1 as in the original paper.” This is a generic mention of GPUs without specific model numbers or other hardware details. |
| Software Dependencies | No | The paper mentions using the 'Simplified Action Decoder (SAD)' and 'deep reinforcement learning (deep RL) based methods' but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We use the open-sourced implementation of SAD as well as most of its hyper-parameters but with two major modifications. First, we use 2 GPUs for simulation instead of 1 as in the original paper... Second, we introduce extra hyper-parameters that control the network architecture to add diversity to the model capacity in order to better demonstrate the effectiveness of OP. Specifically, the network can have either 1 or 2 fully connected layers before 2 LSTM layers and can have an optional residual connection to by-pass the LSTM layers... We train agents with the aforementioned 4 different network architectures. We run each hyper-parameter configuration with 3 different seeds... |