AutoCFR: Learning to Design Counterfactual Regret Minimization Algorithms
Authors: Hang Xu, Kai Li, Haobo Fu, Qiang Fu, Junliang Xing5244-5251
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work proposes to meta-learn novel CFR algorithms through evolution to ease the burden of manual algorithm design. We first design a search language that is rich enough to represent many existing hand-designed CFR variants. We then exploit a scalable regularized evolution algorithm with a bag of acceleration techniques to efficiently search over the combinatorial space of algorithms defined by this language. The learned novel CFR algorithm can generalize to new IIGs not seen during training and performs on par with or better than existing state-of-the-art CFR variants. The code is available at https://github.com/rpSebastian/AutoCFR. and Experiments We first describe the experimental setup, including training games, testing games, and training details. We then analyze the characteristics of the learned algorithm and compare it with state-of-the-art CFR variants. Finally, we conduct some ablations to understand the settings of our framework. |
| Researcher Affiliation | Collaboration | Hang Xu1,2 , Kai Li1,2 , Haobo Fu4, Qiang Fu4, Junliang Xing1,2,3 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Tsinghua University 4Tencent AI Lab {xuhang2020, kai.li}@ia.ac.cn, {haobofu, leonfu}@tencent.com, jlxing@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: Auto CFR s training procedure. and Algorithm 2: Inner loop procedure Eval(A, G). |
| Open Source Code | Yes | The code is available at https://github.com/rpSebastian/AutoCFR. |
| Open Datasets | Yes | We use some commonly used extensive-form games in the IIG research community. Kuhn Poker is a simplified form of poker, with three cards in a deck and one chance to bet for each player. Leduc Poker is a larger game with a 6-card deck and two rounds. In Liar s Dice (x), each player gets an x-sided dice, rolls them at the start, and then takes turns placing bets on the outcome. Goofspiel (x) is a card game where each player has x cards and tries to obtain more points by making sealed bids in x rounds. HUNL Subgame (x) 1 is a heads-up no-limit Texas hold em (HUNL) sub-game generated by Libratus (Brown and Sandholm 2017, 2018). |
| Dataset Splits | No | No explicit training/test/validation dataset splits are mentioned. The paper mentions 'training games G' and 'testing IIGs G' and a 'hurdle game Gh' for early stopping during the search, but not a distinct validation split for hyperparameter tuning of the learned CFR variant itself. |
| Hardware Specification | No | The paper states 'We train Auto CFR on a distributed server with 250 CPU cores and run for about 8 hours,' but does not specify CPU model, memory, or GPU details. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. |
| Experiment Setup | Yes | The population size P is 300, and the tournament size T is 25, the same as those used in (Co-Reyes et al. 2020). The parent program mutates with 0.95 probability and remains the same otherwise. We train Auto CFR on a distributed server with 250 CPU cores and run for about 8 hours... For the inner loop evaluation procedure Eval(A, G), we set iteration M to 1,000 in all games, except for in Liar s Dice (4), where M is 100 since it is a relatively large game. |