reproducibilityindex.ai

No-Press Diplomacy from Scratch

Authors: Anton Bakhtin, David Wu, Adam Lerer, Noam Brown

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using this algorithm, we train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance. Additionally, we extend our methods to full-scale no-press Diplomacy and for the ﬁrst time train an agent from scratch with no human data. We present evidence that this agent plays a strategy that is incompatible with human-data bootstrapped agents. This presents the ﬁrst strong evidence of multiple equilibria in Diplomacy and suggests that self play alone may be insufﬁcient for achieving superhuman performance in Diplomacy.
Researcher Affiliation	Industry	Anton Bakhtin David Wu Adam Lerer Noam Brown Facebook AI Research {yolo,dwu,alerer,noambrown}@fb.com
Pseudocode	Yes	Algorithm 1 Approximated Double Oracle
Open Source Code	Yes	The code and the models are available online 2. https://github.com/facebookresearch/diplomacy_searchbot
Open Datasets	No	Our agent is trained purely through self-play with no human data and no reward shaping. We train a DORA agent from scratch with no human data and test it against pre-trained models from previous work [24, 11]3. The paper describes generating data through self-play rather than using a publicly available dataset that requires a link or citation.
Dataset Splits	No	The paper describes data generation through self-play and a shared experience replay buffer for training but does not specify fixed training/validation/test dataset splits with percentages or sample counts. Data is dynamically generated for training and evaluation is done by playing games.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions using PyTorch but does not provide specific version numbers for PyTorch or any other key software dependencies required for reproducibility.
Experiment Setup	No	Detailed parameters are in Appendix D. The paper explicitly defers the detailed parameters for the experimental setup, including hyperparameters, to Appendix D, rather than providing them in the main text.