AutoCFR: Learning to Design Counterfactual Regret Minimization Algorithms

Authors: Hang Xu, Kai Li, Haobo Fu, Qiang Fu, Junliang Xing5244-5251

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work proposes to meta-learn novel CFR algorithms through evolution to ease the burden of manual algorithm design. We first design a search language that is rich enough to represent many existing hand-designed CFR variants. We then exploit a scalable regularized evolution algorithm with a bag of acceleration techniques to efficiently search over the combinatorial space of algorithms defined by this language. The learned novel CFR algorithm can generalize to new IIGs not seen during training and performs on par with or better than existing state-of-the-art CFR variants. The code is available at https://github.com/rpSebastian/AutoCFR. and Experiments We first describe the experimental setup, including training games, testing games, and training details. We then analyze the characteristics of the learned algorithm and compare it with state-of-the-art CFR variants. Finally, we conduct some ablations to understand the settings of our framework.
Researcher Affiliation Collaboration Hang Xu1,2 , Kai Li1,2 , Haobo Fu4, Qiang Fu4, Junliang Xing1,2,3 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Tsinghua University 4Tencent AI Lab {xuhang2020, kai.li}@ia.ac.cn, {haobofu, leonfu}@tencent.com, jlxing@tsinghua.edu.cn
Pseudocode Yes Algorithm 1: Auto CFR s training procedure. and Algorithm 2: Inner loop procedure Eval(A, G).
Open Source Code Yes The code is available at https://github.com/rpSebastian/AutoCFR.
Open Datasets Yes We use some commonly used extensive-form games in the IIG research community. Kuhn Poker is a simplified form of poker, with three cards in a deck and one chance to bet for each player. Leduc Poker is a larger game with a 6-card deck and two rounds. In Liar s Dice (x), each player gets an x-sided dice, rolls them at the start, and then takes turns placing bets on the outcome. Goofspiel (x) is a card game where each player has x cards and tries to obtain more points by making sealed bids in x rounds. HUNL Subgame (x) 1 is a heads-up no-limit Texas hold em (HUNL) sub-game generated by Libratus (Brown and Sandholm 2017, 2018).
Dataset Splits No No explicit training/test/validation dataset splits are mentioned. The paper mentions 'training games G' and 'testing IIGs G' and a 'hurdle game Gh' for early stopping during the search, but not a distinct validation split for hyperparameter tuning of the learned CFR variant itself.
Hardware Specification No The paper states 'We train Auto CFR on a distributed server with 250 CPU cores and run for about 8 hours,' but does not specify CPU model, memory, or GPU details.
Software Dependencies No No specific software dependencies with version numbers are provided.
Experiment Setup Yes The population size P is 300, and the tournament size T is 25, the same as those used in (Co-Reyes et al. 2020). The parent program mutates with 0.95 probability and remains the same otherwise. We train Auto CFR on a distributed server with 250 CPU cores and run for about 8 hours... For the inner loop evaluation procedure Eval(A, G), we set iteration M to 1,000 in all games, except for in Liar s Dice (4), where M is 100 since it is a relatively large game.