Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent
Authors: Weiming Liu, Huacong Jiang, Bin Li, Houqiang Li
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that the two variants converge faster than conventional FTRL and OMD, even faster than vanilla CFR and CFR+ in some EFGs.4. Experimental Investigation To better understand the properties of FD-FTRL(R) and FD-OMD(R), we test two different methods for setting the weighting parameters λt j: 1) Linear Weighting (LW): λt j = ηyt jnj A 2 t; 2) Constant Weighting (CW)4: λt j = ηyt jnj A 2 T. In the equations, yt j denotes the probability of reaching j of the opponent, and yt j is the average probability. η (0, ) is a global hyperparameter. These configurations are inspired by the observations that ˆrt j 2 2 = O(yt jnj A 2 ) and Pt k=1 ˆrk j 2 2 = O(yt jnj A 2 t) (Brown & Sandholm, 2016). According to Corollary 3.9, FD-FTRL(R) and FD-OMD(R) with both weighting methods have a sub-linear regret bound O( T). For computing the average strategy, we use: 1) Uniform Averaging (UA): x T = 1 T PT t=1 xt; 2) Linear Averaging (LA): x T = 2 T (T +1) PT t=1 txt. FD-FTRL(R) and FD-OMD(R) use CW and LA by default. The algorithms are compared with FTRL, OMD and CFRs. For a fair comparison, the competitors also use LA for computing the average strategies, namely Linear CFR (LCFR) (Brown & Sandholm, 2019b), CFR+, FTRL(LA), and OMD(LA). FTRL(LA) and OMD(LA)5 are the algorithms with LA and a regularizer q0:t(x) = q0(x) = P j J x[pj] 1 2βj ˆx 2 2 for t > 0. We set βj for each point j in FTRL(LA) and OMD(LA) according to (Farina et al., 2019b): βj = 2σ + 2 maxa Aj P j Cj,a βj , where σ is a hyper-parameter. In FD-FTRL(R) and FTRL(LA), the losses are also weighted linearly, as in LCFR. All the algorithms use alternating updates, as in CFRs. We run a coarse grid search for tuning the hyper-parameter in each algorithm, and the best one will be used for the comparison. The tuning results of FD-FTRL(R) and FD-OMD(R) are given in Appendix G. We conduct our experiments in eight benchmark games, including Leduc (Southey et al., 2012) and FHP (Brown et al., 2019). A description of |
| Researcher Affiliation | Academia | 1School of Data Science, University of Science and Technology of China, Hefei, China 2Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China. |
| Pseudocode | Yes | Algorithm 1 Regret Minimization Framework 1: for iteration t = 1 to T do 2: lt Observe Loss(xt). 3: xt+1 Update(l1, . . . , lt, x1, . . . , xt). 4: end for |
| Open Source Code | No | The experiments use part of the code of project Open Spiel (Lanctot et al., 2019).6 |
| Open Datasets | Yes | We conduct our experiments in eight benchmark games, including Leduc (Southey et al., 2012) and FHP (Brown et al., 2019). A description of the games is given in Appendix F. |
| Dataset Splits | No | The paper describes experiments run on benchmark games (environments) but does not specify explicit training/validation/test dataset splits, which are typically defined for traditional datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The experiments use part of the code of project Open Spiel (Lanctot et al., 2019).6 |
| Experiment Setup | Yes | To better understand the properties of FD-FTRL(R) and FD-OMD(R), we test two different methods for setting the weighting parameters λt j: 1) Linear Weighting (LW): λt j = ηyt jnj A 2 t; 2) Constant Weighting (CW)4: λt j = ηyt jnj A 2 T. In the equations, yt j denotes the probability of reaching j of the opponent, and yt j is the average probability. η (0, ) is a global hyperparameter. ... For computing the average strategy, we use: 1) Uniform Averaging (UA): x T = 1 T PT t=1 xt; 2) Linear Averaging (LA): x T = 2 T (T +1) PT t=1 txt. ... We set βj for each point j in FTRL(LA) and OMD(LA) according to (Farina et al., 2019b): βj = 2σ + 2 maxa Aj P j Cj,a βj , where σ is a hyper-parameter. ... We run a coarse grid search for tuning the hyper-parameter in each algorithm, and the best one will be used for the comparison. The tuning results of FD-FTRL(R) and FD-OMD(R) are given in Appendix G. |