Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
Authors: Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+ s fast convergence in common imperfect-information games. |
| Researcher Affiliation | Collaboration | Hang Xu1,2 , Kai Li1,2, , Bingyun Liu1,2 , Haobo Fu6 , Qiang Fu6 , Junliang Xing5, and Jian Cheng1,3,4 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Future Technology, University of Chinese Academy of Sciences 4Ai Ri A 5Tsinghua University 6Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1: Construction of a weighed CFR variant using a general regret minimization algorithm from player 1 s perspective. |
| Open Source Code | Yes | The code is available at https://github.com/ rp Sebastian/PDCFRPlus. |
| Open Datasets | Yes | Kuhn Poker [Kuhn, 1950] is a simplified poker with a three-card deck and one chance to bet for each player. Leduc Poker [Southey et al., 2005] is a larger game with a 6-card deck and two betting rounds. In Liar s Dice (x) (x=4, 5) [Lis y et al., 2015], each player gets an xsided dice, which they roll at the start and take turn placing bets on the outcome. Goofspiel (x) (x=4, 5) [Ross, 1971] is a card game where each player has x cards and aims to score points by bidding simultaneously in x rounds. |
| Dataset Splits | No | The paper states 'We perform a coarse grid search to fine-tune the hyperparameters for PDCFR+, and the best one, i.e., α = 2.3 and γ = 5, is then used across all games.' but does not specify a distinct validation set or explicit dataset split for this tuning process. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or computational resources used for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | For DCFR, we adopt the hyperparameters suggested by the authors, specifically α = 1.5, β = 0, and γ = 2. Regarding DCFR+ and PDCFR+, ... for DCFR+, we set α = 1.5 and γ = 4 ... for PDCFR+, and the best one, i.e., α = 2.3 and γ = 5, is then used across all games. All algorithms utilize the alternating-updates technique. We run each algorithm for 20,000 iterations in each testing game to observe their long-term behavior. |