reproducibilityindex.ai

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

Authors: Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via regret minimization. ... We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and ranks in the top 2% of human players when playing anonymous games on a popular Diplomacy website.
Researcher Affiliation	Industry	Jonathan Gray , Adam Lerer , Anton Bakhtin, Noam Brown Facebook AI Research {jsgray,alerer,yolo,noambrown}@fb.com
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any links to open-source code for the described methodology or explicitly state that the code will be released.
Open Datasets	No	The paper states, 'We thank Kestas Kuliukas and the entire webdiplomacy.net team for their cooperation and for providing the dataset used in this research.' but does not provide concrete access information (link, DOI, or explicit public availability) for the dataset.
Dataset Splits	No	The paper refers to a 'corpus of 46,148 Diplomacy games' and 'training data', but does not explicitly provide training, validation, or test dataset splits.
Hardware Specification	Yes	The search algorithm used ...typically required between 2 minutes and 20 minutes per turn using a single Volta GPU and 8 CPU cores, depending on the hyperparameters used for the game.
Software Dependencies	No	The paper mentions software components like 'Python', 'PyTorch', and 'CUDA' implicitly through context (e.g., 'deep reinforcement learning', 'neural networks'), but it does not specify explicit version numbers for any key software dependencies.
Experiment Setup	Yes	In non-live games, we typically ran RM for 2,048 iterations with a rollout length of 3 movement phases, and set M (the constant which is multiplied by the number of units to determine the number of subgame actions) equal to 5. ... In live games...we ran RM for 256 iterations with a rollout length of 2 movement phases, and set M equal to 3.5. ... In all cases, the temperature for the blueprint in rollouts was set to 0.75.