Solving Large Extensive-Form Games with Strategy Constraints

Authors: Trevor Davis, Kevin Waugh, Michael Bowling1861-1868

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental evaluation We present two domains for experimental evaluation in this paper. In the first, we use constraints to model a secondary objective when generating strategies in a model security game. In the second domain, we use constraints for opponent modeling in a small poker game. We demonstrate that using constraints for modeling data allows us to learn counter-strategies that approach optimal counter-strategies as the amount of data increases.
Researcher Affiliation Collaboration Trevor Davis,1 Kevin Waugh,2 Michael Bowling2,1 1Department of Computing Science, University of Alberta 2Deep Mind
Pseudocode Yes Algorithm 1 Constrained CFR
Open Source Code No The transit game experiments were implemented with code made publically available by the game theory group of the Artificial Intelligence Center at Czech Technical University in Prague. This refers to code they used, not necessarily the code for their proposed CCFR algorithm itself. No explicit statement or link for their own code.
Open Datasets Yes We ran our experiments in Leduc Hold em (Southey et al. 2005), a small poker game played with a six card deck over two betting rounds.
Dataset Splits No The paper describes generating constraints from observed games and evaluating performance as the number of observed games increases, but it does not specify explicit training, validation, or test dataset splits, nor does it mention cross-validation.
Hardware Specification No The paper states 'Computing resources were provided by Compute Canada and Calcul Qu ebec.' but does not specify any particular hardware components such as GPU models, CPU models, or memory details used for the experiments.
Software Dependencies Yes by comparing its produced strategies with strategies produced by solving the LP representation of the game with the simplex solver in IBM ILOG CPLEX 12.7.1.
Experiment Setup Yes We update the CCFR constraint weights λ using stochastic gradient ascent with constant step size αt = 1, which we found to work well across a variety of game sizes and risk bounds.