Low-Variance and Zero-Variance Baselines for Extensive-Form Games

Authors: Trevor Davis, Martin Schmid, Michael Bowling

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we show empirically that our new baselines result in significantly reduced variance and faster convergence. 5. Experimental comparison
Researcher Affiliation Collaboration 1Alberta Machine Intelligence Institute, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada 2Deep Mind, Edmonton, Alberta, Canada.
Pseudocode Yes Pseudocode for MCCFR with baseline-corrected values is given in the supplementary materials.
Open Source Code Yes An open source implementation of CFR+ and Leduc hold em is available from the University of Alberta (http://webdocs. cs.ualberta.ca/~games/poker/cfr_plus.html).
Open Datasets Yes We run our experiments using a commodity desktop machine in Leduc hold em (Southey et al., 2005), a small poker game commonly used as a benchmark in games research. using two versions of Generic Poker (r,4,4,1) with r = 6 and r = 13 (Lisý et al., 2015).
Dataset Splits No The paper uses game environments and evaluates performance directly, but it does not specify explicit training/validation/test dataset splits or other detailed data partitioning methodologies.
Hardware Specification No We run our experiments using a commodity desktop machine - this is too general and does not provide specific hardware details like CPU/GPU models or memory.
Software Dependencies No The paper mentions 'CFR+' and 'MCCFR' as algorithms and 'Leduc hold em' as a game environment, but it does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes Our experiments use the regret zeroing and linear averaging of CFR+, as these improve convergence when combined with any of the nonzero baselines examined in this work. For the static strategy baseline, we use the always call strategy... For both of the learned baselines, we use simple averaging as it performed best in preliminary experiments. We run experiments with two sampling strategies. The first is uniform sampling... The second is opponent on-policy sampling... For consistency, we use alternating updates for both schemes. For the learned baselines, we use exponentially-decaying averaging with α = 0.5