reproducibilityindex.ai

Low-Variance and Zero-Variance Baselines for Extensive-Form Games

Authors: Trevor Davis, Martin Schmid, Michael Bowling

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we show empirically that our new baselines result in signiﬁcantly reduced variance and faster convergence. 5. Experimental comparison
Researcher Affiliation	Collaboration	1Alberta Machine Intelligence Institute, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada 2Deep Mind, Edmonton, Alberta, Canada.
Pseudocode	Yes	Pseudocode for MCCFR with baseline-corrected values is given in the supplementary materials.
Open Source Code	Yes	An open source implementation of CFR+ and Leduc hold em is available from the University of Alberta (http://webdocs. cs.ualberta.ca/~games/poker/cfr_plus.html).
Open Datasets	Yes	We run our experiments using a commodity desktop machine in Leduc hold em (Southey et al., 2005), a small poker game commonly used as a benchmark in games research. using two versions of Generic Poker (r,4,4,1) with r = 6 and r = 13 (Lisý et al., 2015).
Dataset Splits	No	The paper uses game environments and evaluates performance directly, but it does not specify explicit training/validation/test dataset splits or other detailed data partitioning methodologies.
Hardware Specification	No	We run our experiments using a commodity desktop machine - this is too general and does not provide specific hardware details like CPU/GPU models or memory.
Software Dependencies	No	The paper mentions 'CFR+' and 'MCCFR' as algorithms and 'Leduc hold em' as a game environment, but it does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	Our experiments use the regret zeroing and linear averaging of CFR+, as these improve convergence when combined with any of the nonzero baselines examined in this work. For the static strategy baseline, we use the always call strategy... For both of the learned baselines, we use simple averaging as it performed best in preliminary experiments. We run experiments with two sampling strategies. The ﬁrst is uniform sampling... The second is opponent on-policy sampling... For consistency, we use alternating updates for both schemes. For the learned baselines, we use exponentially-decaying averaging with α = 0.5