reproducibilityindex.ai

AutoCFR: Learning to Design Counterfactual Regret Minimization Algorithms

Authors: Hang Xu, Kai Li, Haobo Fu, Qiang Fu, Junliang Xing5244-5251

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work proposes to meta-learn novel CFR algorithms through evolution to ease the burden of manual algorithm design. We ﬁrst design a search language that is rich enough to represent many existing hand-designed CFR variants. We then exploit a scalable regularized evolution algorithm with a bag of acceleration techniques to efﬁciently search over the combinatorial space of algorithms deﬁned by this language. The learned novel CFR algorithm can generalize to new IIGs not seen during training and performs on par with or better than existing state-of-the-art CFR variants. The code is available at https://github.com/rpSebastian/AutoCFR. and Experiments We ﬁrst describe the experimental setup, including training games, testing games, and training details. We then analyze the characteristics of the learned algorithm and compare it with state-of-the-art CFR variants. Finally, we conduct some ablations to understand the settings of our framework.
Researcher Affiliation	Collaboration	Hang Xu1,2 , Kai Li1,2 , Haobo Fu4, Qiang Fu4, Junliang Xing1,2,3 1Institute of Automation, Chinese Academy of Sciences 2School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences 3Tsinghua University 4Tencent AI Lab {xuhang2020, kai.li}@ia.ac.cn, {haobofu, leonfu}@tencent.com, jlxing@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1: Auto CFR s training procedure. and Algorithm 2: Inner loop procedure Eval(A, G).
Open Source Code	Yes	The code is available at https://github.com/rpSebastian/AutoCFR.
Open Datasets	Yes	We use some commonly used extensive-form games in the IIG research community. Kuhn Poker is a simpliﬁed form of poker, with three cards in a deck and one chance to bet for each player. Leduc Poker is a larger game with a 6-card deck and two rounds. In Liar s Dice (x), each player gets an x-sided dice, rolls them at the start, and then takes turns placing bets on the outcome. Goofspiel (x) is a card game where each player has x cards and tries to obtain more points by making sealed bids in x rounds. HUNL Subgame (x) 1 is a heads-up no-limit Texas hold em (HUNL) sub-game generated by Libratus (Brown and Sandholm 2017, 2018).
Dataset Splits	No	No explicit training/test/validation dataset splits are mentioned. The paper mentions 'training games G' and 'testing IIGs G' and a 'hurdle game Gh' for early stopping during the search, but not a distinct validation split for hyperparameter tuning of the learned CFR variant itself.
Hardware Specification	No	The paper states 'We train Auto CFR on a distributed server with 250 CPU cores and run for about 8 hours,' but does not specify CPU model, memory, or GPU details.
Software Dependencies	No	No specific software dependencies with version numbers are provided.
Experiment Setup	Yes	The population size P is 300, and the tournament size T is 25, the same as those used in (Co-Reyes et al. 2020). The parent program mutates with 0.95 probability and remains the same otherwise. We train Auto CFR on a distributed server with 250 CPU cores and run for about 8 hours... For the inner loop evaluation procedure Eval(A, G), we set iteration M to 1,000 in all games, except for in Liar s Dice (4), where M is 100 since it is a relatively large game.