Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Authors: Weiming Liu, Huacong Jiang, Bin Li, Houqiang Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that the two variants converge faster than conventional FTRL and OMD, even faster than vanilla CFR and CFR+ in some EFGs.4. Experimental Investigation To better understand the properties of FD-FTRL(R) and FD-OMD(R), we test two different methods for setting the weighting parameters λt j: 1) Linear Weighting (LW): λt j = ηyt jnj A 2 t; 2) Constant Weighting (CW)4: λt j = ηyt jnj A 2 T. In the equations, yt j denotes the probability of reaching j of the opponent, and yt j is the average probability. η (0, ) is a global hyperparameter. These configurations are inspired by the observations that ˆrt j 2 2 = O(yt jnj A 2 ) and Pt k=1 ˆrk j 2 2 = O(yt jnj A 2 t) (Brown & Sandholm, 2016). According to Corollary 3.9, FD-FTRL(R) and FD-OMD(R) with both weighting methods have a sub-linear regret bound O( T). For computing the average strategy, we use: 1) Uniform Averaging (UA): x T = 1 T PT t=1 xt; 2) Linear Averaging (LA): x T = 2 T (T +1) PT t=1 txt. FD-FTRL(R) and FD-OMD(R) use CW and LA by default. The algorithms are compared with FTRL, OMD and CFRs. For a fair comparison, the competitors also use LA for computing the average strategies, namely Linear CFR (LCFR) (Brown & Sandholm, 2019b), CFR+, FTRL(LA), and OMD(LA). FTRL(LA) and OMD(LA)5 are the algorithms with LA and a regularizer q0:t(x) = q0(x) = P j J x[pj] 1 2βj ˆx 2 2 for t > 0. We set βj for each point j in FTRL(LA) and OMD(LA) according to (Farina et al., 2019b): βj = 2σ + 2 maxa Aj P j Cj,a βj , where σ is a hyper-parameter. In FD-FTRL(R) and FTRL(LA), the losses are also weighted linearly, as in LCFR. All the algorithms use alternating updates, as in CFRs. We run a coarse grid search for tuning the hyper-parameter in each algorithm, and the best one will be used for the comparison. The tuning results of FD-FTRL(R) and FD-OMD(R) are given in Appendix G. We conduct our experiments in eight benchmark games, including Leduc (Southey et al., 2012) and FHP (Brown et al., 2019). A description of
Researcher Affiliation Academia 1School of Data Science, University of Science and Technology of China, Hefei, China 2Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China.
Pseudocode Yes Algorithm 1 Regret Minimization Framework 1: for iteration t = 1 to T do 2: lt Observe Loss(xt). 3: xt+1 Update(l1, . . . , lt, x1, . . . , xt). 4: end for
Open Source Code No The experiments use part of the code of project Open Spiel (Lanctot et al., 2019).6
Open Datasets Yes We conduct our experiments in eight benchmark games, including Leduc (Southey et al., 2012) and FHP (Brown et al., 2019). A description of the games is given in Appendix F.
Dataset Splits No The paper describes experiments run on benchmark games (environments) but does not specify explicit training/validation/test dataset splits, which are typically defined for traditional datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The experiments use part of the code of project Open Spiel (Lanctot et al., 2019).6
Experiment Setup Yes To better understand the properties of FD-FTRL(R) and FD-OMD(R), we test two different methods for setting the weighting parameters λt j: 1) Linear Weighting (LW): λt j = ηyt jnj A 2 t; 2) Constant Weighting (CW)4: λt j = ηyt jnj A 2 T. In the equations, yt j denotes the probability of reaching j of the opponent, and yt j is the average probability. η (0, ) is a global hyperparameter. ... For computing the average strategy, we use: 1) Uniform Averaging (UA): x T = 1 T PT t=1 xt; 2) Linear Averaging (LA): x T = 2 T (T +1) PT t=1 txt. ... We set βj for each point j in FTRL(LA) and OMD(LA) according to (Farina et al., 2019b): βj = 2σ + 2 maxa Aj P j Cj,a βj , where σ is a hyper-parameter. ... We run a coarse grid search for tuning the hyper-parameter in each algorithm, and the best one will be used for the comparison. The tuning results of FD-FTRL(R) and FD-OMD(R) are given in Appendix G.