reproducibilityindex.ai

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Authors: Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+ s fast convergence in common imperfect-information games.
Researcher Affiliation	Collaboration	Hang Xu1,2 , Kai Li1,2, , Bingyun Liu1,2 , Haobo Fu6 , Qiang Fu6 , Junliang Xing5, and Jian Cheng1,3,4 1Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Future Technology, University of Chinese Academy of Sciences 4Ai Ri A 5Tsinghua University 6Tencent AI Lab
Pseudocode	Yes	Algorithm 1: Construction of a weighed CFR variant using a general regret minimization algorithm from player 1 s perspective.
Open Source Code	Yes	The code is available at https://github.com/ rp Sebastian/PDCFRPlus.
Open Datasets	Yes	Kuhn Poker [Kuhn, 1950] is a simplified poker with a three-card deck and one chance to bet for each player. Leduc Poker [Southey et al., 2005] is a larger game with a 6-card deck and two betting rounds. In Liar s Dice (x) (x=4, 5) [Lis y et al., 2015], each player gets an xsided dice, which they roll at the start and take turn placing bets on the outcome. Goofspiel (x) (x=4, 5) [Ross, 1971] is a card game where each player has x cards and aims to score points by bidding simultaneously in x rounds.
Dataset Splits	No	The paper states 'We perform a coarse grid search to fine-tune the hyperparameters for PDCFR+, and the best one, i.e., α = 2.3 and γ = 5, is then used across all games.' but does not specify a distinct validation set or explicit dataset split for this tuning process.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory, or computational resources used for the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	For DCFR, we adopt the hyperparameters suggested by the authors, specifically α = 1.5, β = 0, and γ = 2. Regarding DCFR+ and PDCFR+, ... for DCFR+, we set α = 1.5 and γ = 4 ... for PDCFR+, and the best one, i.e., α = 2.3 and γ = 5, is then used across all games. All algorithms utilize the alternating-updates technique. We run each algorithm for 20,000 iterations in each testing game to observe their long-term behavior.