Low-Variance and Zero-Variance Baselines for Extensive-Form Games
Authors: Trevor Davis, Martin Schmid, Michael Bowling
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we show empirically that our new baselines result in significantly reduced variance and faster convergence. 5. Experimental comparison |
| Researcher Affiliation | Collaboration | 1Alberta Machine Intelligence Institute, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada 2Deep Mind, Edmonton, Alberta, Canada. |
| Pseudocode | Yes | Pseudocode for MCCFR with baseline-corrected values is given in the supplementary materials. |
| Open Source Code | Yes | An open source implementation of CFR+ and Leduc hold em is available from the University of Alberta (http://webdocs. cs.ualberta.ca/~games/poker/cfr_plus.html). |
| Open Datasets | Yes | We run our experiments using a commodity desktop machine in Leduc hold em (Southey et al., 2005), a small poker game commonly used as a benchmark in games research. using two versions of Generic Poker (r,4,4,1) with r = 6 and r = 13 (Lisý et al., 2015). |
| Dataset Splits | No | The paper uses game environments and evaluates performance directly, but it does not specify explicit training/validation/test dataset splits or other detailed data partitioning methodologies. |
| Hardware Specification | No | We run our experiments using a commodity desktop machine - this is too general and does not provide specific hardware details like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions 'CFR+' and 'MCCFR' as algorithms and 'Leduc hold em' as a game environment, but it does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Our experiments use the regret zeroing and linear averaging of CFR+, as these improve convergence when combined with any of the nonzero baselines examined in this work. For the static strategy baseline, we use the always call strategy... For both of the learned baselines, we use simple averaging as it performed best in preliminary experiments. We run experiments with two sampling strategies. The first is uniform sampling... The second is opponent on-policy sampling... For consistency, we use alternating updates for both schemes. For the learned baselines, we use exponentially-decaying averaging with α = 0.5 |