XDO: A Double Oracle Algorithm for Extensive-Form Games

Authors: Stephen McAleer, JB Lanier, Kevin A Wang, Pierre Baldi, Roy Fox

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game.
Researcher Affiliation Academia Stephen Mc Aleer Department of Computer Science University of California, Irvine smcaleer@uci.edu; John Lanier Department of Computer Science University of California, Irvine jblanier@uci.edu; Kevin A. Wang Department of Computer Science University of California, Irvine kevinwang@kevinwang.us; Pierre Baldi Department of Computer Science University of California, Irvine pfbaldi@ics.uci.edu; Roy Fox Department of Computer Science University of California, Irvine royf@uci.edu
Pseudocode Yes Algorithm 1 XDO; Algorithm 2 NXDO
Open Source Code Yes Experiment code is available at https://github.com/indylab/nxdo.
Open Datasets No The paper refers to established game environments like Leduc poker, Oshi-Zumo, and custom games like m-Clone Leduc and the Loss Game. While it mentions using Open Spiel [Lanctot et al., 2019] for some implementations, it does not provide concrete access information (specific link, DOI, or repository) for a publicly available 'dataset' of game states or play traces.
Dataset Splits No The paper describes experiments in various game environments but does not provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for the data used in these experiments.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions several software components and algorithms used (e.g., CFR, XFP, PSRO, NFSP, PPO, DDQN, Open Spiel) but does not provide specific version numbers for any of them (e.g., 'PyTorch 1.9' or 'Open Spiel v0.1').
Experiment Setup No The paper states, 'Training details and an analysis on the proportion of experience spent in the inner vs outer loop of NXDO are included in supplementary materials.' This indicates that the specific experimental setup details, such as hyperparameters or training configurations, are not present in the main text of the paper.