Solving Large Extensive-Form Games with Strategy Constraints
Authors: Trevor Davis, Kevin Waugh, Michael Bowling1861-1868
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental evaluation We present two domains for experimental evaluation in this paper. In the first, we use constraints to model a secondary objective when generating strategies in a model security game. In the second domain, we use constraints for opponent modeling in a small poker game. We demonstrate that using constraints for modeling data allows us to learn counter-strategies that approach optimal counter-strategies as the amount of data increases. |
| Researcher Affiliation | Collaboration | Trevor Davis,1 Kevin Waugh,2 Michael Bowling2,1 1Department of Computing Science, University of Alberta 2Deep Mind |
| Pseudocode | Yes | Algorithm 1 Constrained CFR |
| Open Source Code | No | The transit game experiments were implemented with code made publically available by the game theory group of the Artificial Intelligence Center at Czech Technical University in Prague. This refers to code they used, not necessarily the code for their proposed CCFR algorithm itself. No explicit statement or link for their own code. |
| Open Datasets | Yes | We ran our experiments in Leduc Hold em (Southey et al. 2005), a small poker game played with a six card deck over two betting rounds. |
| Dataset Splits | No | The paper describes generating constraints from observed games and evaluating performance as the number of observed games increases, but it does not specify explicit training, validation, or test dataset splits, nor does it mention cross-validation. |
| Hardware Specification | No | The paper states 'Computing resources were provided by Compute Canada and Calcul Qu ebec.' but does not specify any particular hardware components such as GPU models, CPU models, or memory details used for the experiments. |
| Software Dependencies | Yes | by comparing its produced strategies with strategies produced by solving the LP representation of the game with the simplex solver in IBM ILOG CPLEX 12.7.1. |
| Experiment Setup | Yes | We update the CCFR constraint weights λ using stochastic gradient ascent with constant step size αt = 1, which we found to work well across a variety of game sizes and risk bounds. |