No-Press Diplomacy from Scratch

Authors: Anton Bakhtin, David Wu, Adam Lerer, Noam Brown

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using this algorithm, we train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance. Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data. We present evidence that this agent plays a strategy that is incompatible with human-data bootstrapped agents. This presents the first strong evidence of multiple equilibria in Diplomacy and suggests that self play alone may be insufficient for achieving superhuman performance in Diplomacy.
Researcher Affiliation Industry Anton Bakhtin David Wu Adam Lerer Noam Brown Facebook AI Research {yolo,dwu,alerer,noambrown}@fb.com
Pseudocode Yes Algorithm 1 Approximated Double Oracle
Open Source Code Yes The code and the models are available online 2. https://github.com/facebookresearch/diplomacy_searchbot
Open Datasets No Our agent is trained purely through self-play with no human data and no reward shaping. We train a DORA agent from scratch with no human data and test it against pre-trained models from previous work [24, 11]3. The paper describes generating data through self-play rather than using a publicly available dataset that requires a link or citation.
Dataset Splits No The paper describes data generation through self-play and a shared experience replay buffer for training but does not specify fixed training/validation/test dataset splits with percentages or sample counts. Data is dynamically generated for training and evaluation is done by playing games.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other key software dependencies required for reproducibility.
Experiment Setup No Detailed parameters are in Appendix D. The paper explicitly defers the detailed parameters for the experimental setup, including hyperparameters, to Appendix D, rather than providing them in the main text.