Regret-Minimizing Double Oracle for Extensive-Form Games

Authors: Xiaohang Tang, Le Cong Dinh, Stephen Marcus Mcaleer, Yaodong Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods. Empirical assessments on typical poker and board games demonstrate that PDO achieves much faster convergence compared to XDO and ODO.
Researcher Affiliation Academia 1Department of Statistical Science, University College London 2University of Southampton 3Carnegie Mellon University 4Institute for AI, Peking University.
Pseudocode Yes Algorithm 1 XDO (Mc Aleer et al., 2021) ... Algorithm 2 Regret-Minimizing Double Oracle ... Algorithm 3 Extensive-Form Online Double Oracle ... Algorithm 4 Periodic Double Oracle
Open Source Code Yes Code is available at https://github.com/xiaohangt/RMDO.
Open Datasets Yes We conducted empirical assessments on a variety of extensive-form games, including Sequential Blotto (perfect-information extensive-form game), Kuhn Poker with a initial pot 40 for each player, Leduc Poker, Leduc Poker Dummy and Oshi Zumo. The full description of games is in Appendix C.
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits. It evaluates game-solving algorithms on entire game environments rather than partitioned datasets.
Hardware Specification No The authors would like to thank the Department of Statistical Science at University College London for providing the computing clusters. This mentions a type of resource but lacks specific hardware details like GPU/CPU models.
Software Dependencies No All experiments and algorithm implementations are based on Open Spiel (Lanctot et al., 2019). The experiment uses the state-of-the-art exact regret minimizer, CFR+ (Tammelin, 2014). While software is mentioned, specific version numbers are not provided for Open Spiel, CFR+, or any other libraries.
Experiment Setup Yes Input: hyperparameter m, window index j = 0, uniform random strategy π0... if t mod m(j) = 0 then... We present a comparison between the XODO and PDO with periodicity values of m = 1, 10, 50, 100... The Extensive-form Double Oracle (XDO) algorithm is initialized with a given threshold ϵ0, which is divided by two each time the local exploitability of the regret minimizer meets the threshold.