Stable-Predictive Optimistic Counterfactual Regret Minimization

Authors: Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we present the first CFR variant that breaks the square-root dependence on iterations. By combining and extending recent advances on predictive and stable regret minimizers for the matrix-game setting we show that it is possible to leverage optimistic regret minimizers to achieve a O(T 3/4) convergence rate within CFR. This is achieved by introducing a new notion of stable-predictivity, and by setting the stability of each counterfactual regret minimizer relative to its location in the decision tree. Experiments show that this method is faster than the original CFR algorithm, although not as fast as newer variants, in spite of their worst-case O(T 1/2) dependence on iterations.
Researcher Affiliation Collaboration Gabriele Farina 1 Christian Kroer 2 Noam Brown 1 Tuomas Sandholm 1 3 4 5 1Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213 2IEOR Department, Columbia University, New York NY 10027 3Strategic Machine, Inc. 4Strategy Robot, Inc. 5Optimized Markets, Inc.
Pseudocode No The paper describes algorithms and methods using mathematical notation and textual descriptions but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions 'four open-source subgames solved by Libratus AI' and provides a GitHub link (https://github.com/CMU-EM/Libratus Endgames), but this link is for the external benchmark subgames used in the experiments, not for the authors' own implementation of their proposed CFR variant.
Open Datasets Yes We conduct our experiments on four open-source subgames solved by Libratus AI which beat top poker professionals (Brown & Sandholm, 2017b).2 Following prior convention, we use the bet sizes of 0.5 the size of the pot, 1 the size of the pot, and the all-in bet for the first bet of each round. For subsequent bets in a round, we consider 1 the pot and the all-in bet. 2https://github.com/CMU-EM/Libratus Endgames
Dataset Splits No The paper describes experiments on 'Heads-up no-limit Texas hold em poker (HUNL) subgames' and measures 'exploitability', but it does not specify explicit train/validation/test dataset splits for these subgames.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. It refers to 'Libratus AI' as a source of subgames but does not specify the computational resources used for the authors' own experiments.
Software Dependencies No The paper mentions various algorithms like 'CFR', 'CFR+', 'Discounted CFR (DCFR)', and 'OFTRL', but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific solver versions) required to replicate the experiments.
Experiment Setup Yes DCFR was set up with parameters (α, β, γ) = (1.5, 0, 2), as recommended in the original DCFR paper. For OFTRL we use the stepsize that the theory suggests in our experiments on subgames 3 and 4 (labeled OFTRL theory). For subgames 1 and 2 we found that the theoretically-correct stepsize is much too conservative, so we also show results with a less-conservative parameter found through dividing the stepsize by 10, 100, and 1000, and picking the best among those (labeled OFTRL tuned).