Regret-Minimizing Double Oracle for Extensive-Form Games
Authors: Xiaohang Tang, Le Cong Dinh, Stephen Marcus Mcaleer, Yaodong Yang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods. Empirical assessments on typical poker and board games demonstrate that PDO achieves much faster convergence compared to XDO and ODO. |
| Researcher Affiliation | Academia | 1Department of Statistical Science, University College London 2University of Southampton 3Carnegie Mellon University 4Institute for AI, Peking University. |
| Pseudocode | Yes | Algorithm 1 XDO (Mc Aleer et al., 2021) ... Algorithm 2 Regret-Minimizing Double Oracle ... Algorithm 3 Extensive-Form Online Double Oracle ... Algorithm 4 Periodic Double Oracle |
| Open Source Code | Yes | Code is available at https://github.com/xiaohangt/RMDO. |
| Open Datasets | Yes | We conducted empirical assessments on a variety of extensive-form games, including Sequential Blotto (perfect-information extensive-form game), Kuhn Poker with a initial pot 40 for each player, Leduc Poker, Leduc Poker Dummy and Oshi Zumo. The full description of games is in Appendix C. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits. It evaluates game-solving algorithms on entire game environments rather than partitioned datasets. |
| Hardware Specification | No | The authors would like to thank the Department of Statistical Science at University College London for providing the computing clusters. This mentions a type of resource but lacks specific hardware details like GPU/CPU models. |
| Software Dependencies | No | All experiments and algorithm implementations are based on Open Spiel (Lanctot et al., 2019). The experiment uses the state-of-the-art exact regret minimizer, CFR+ (Tammelin, 2014). While software is mentioned, specific version numbers are not provided for Open Spiel, CFR+, or any other libraries. |
| Experiment Setup | Yes | Input: hyperparameter m, window index j = 0, uniform random strategy π0... if t mod m(j) = 0 then... We present a comparison between the XODO and PDO with periodicity values of m = 1, 10, 50, 100... The Extensive-form Double Oracle (XDO) algorithm is initialized with a given threshold ϵ0, which is divided by two each time the local exploitability of the regret minimizer meets the threshold. |