No-Press Diplomacy from Scratch
Authors: Anton Bakhtin, David Wu, Adam Lerer, Noam Brown
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using this algorithm, we train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance. Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data. We present evidence that this agent plays a strategy that is incompatible with human-data bootstrapped agents. This presents the first strong evidence of multiple equilibria in Diplomacy and suggests that self play alone may be insufficient for achieving superhuman performance in Diplomacy. |
| Researcher Affiliation | Industry | Anton Bakhtin David Wu Adam Lerer Noam Brown Facebook AI Research {yolo,dwu,alerer,noambrown}@fb.com |
| Pseudocode | Yes | Algorithm 1 Approximated Double Oracle |
| Open Source Code | Yes | The code and the models are available online 2. https://github.com/facebookresearch/diplomacy_searchbot |
| Open Datasets | No | Our agent is trained purely through self-play with no human data and no reward shaping. We train a DORA agent from scratch with no human data and test it against pre-trained models from previous work [24, 11]3. The paper describes generating data through self-play rather than using a publicly available dataset that requires a link or citation. |
| Dataset Splits | No | The paper describes data generation through self-play and a shared experience replay buffer for training but does not specify fixed training/validation/test dataset splits with percentages or sample counts. Data is dynamically generated for training and evaluation is done by playing games. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other key software dependencies required for reproducibility. |
| Experiment Setup | No | Detailed parameters are in Appendix D. The paper explicitly defers the detailed parameters for the experimental setup, including hyperparameters, to Appendix D, rather than providing them in the main text. |