Human-Level Performance in No-Press Diplomacy via Equilibrium Search
Authors: Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via regret minimization. ... We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and ranks in the top 2% of human players when playing anonymous games on a popular Diplomacy website. |
| Researcher Affiliation | Industry | Jonathan Gray , Adam Lerer , Anton Bakhtin, Noam Brown Facebook AI Research {jsgray,alerer,yolo,noambrown}@fb.com |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links to open-source code for the described methodology or explicitly state that the code will be released. |
| Open Datasets | No | The paper states, 'We thank Kestas Kuliukas and the entire webdiplomacy.net team for their cooperation and for providing the dataset used in this research.' but does not provide concrete access information (link, DOI, or explicit public availability) for the dataset. |
| Dataset Splits | No | The paper refers to a 'corpus of 46,148 Diplomacy games' and 'training data', but does not explicitly provide training, validation, or test dataset splits. |
| Hardware Specification | Yes | The search algorithm used ...typically required between 2 minutes and 20 minutes per turn using a single Volta GPU and 8 CPU cores, depending on the hyperparameters used for the game. |
| Software Dependencies | No | The paper mentions software components like 'Python', 'PyTorch', and 'CUDA' implicitly through context (e.g., 'deep reinforcement learning', 'neural networks'), but it does not specify explicit version numbers for any key software dependencies. |
| Experiment Setup | Yes | In non-live games, we typically ran RM for 2,048 iterations with a rollout length of 3 movement phases, and set M (the constant which is multiplied by the number of units to determine the number of subgame actions) equal to 5. ... In live games...we ran RM for 256 iterations with a rollout length of 2 movement phases, and set M equal to 3.5. ... In all cases, the temperature for the blueprint in rollouts was set to 0.75. |