Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games
Authors: Brian Hu Zhang, Tuomas Sandholm5779-5788
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally that, in the black-box setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree. We conducted experiments on two common benchmarks: (1) k-rank Goofspiel. (2) k-rank heads-up limit Leduc poker (Southey et al. 2005). |
| Researcher Affiliation | Collaboration | 1Computer Science Department, Carnegie Mellon University 2Strategic Machine, Inc. 3Strategy Robot, Inc. 4Optimized Markets, Inc. |
| Pseudocode | Yes | Algorithm 6.1 Two-player zero-sum certificate finding; Algorithm 7.2 Certificate-finding with regret minimization |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We conducted experiments on two common benchmarks: (1) k-rank Goofspiel. (2) k-rank heads-up limit Leduc poker (Southey et al. 2005) |
| Dataset Splits | No | The paper discusses 'playthroughs' and 'samples' from a black-box simulator but does not describe conventional training, validation, and test dataset splits with specific percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments, only mentioning the use of Gurobi for LP solving. |
| Software Dependencies | Yes | We use Gurobi v9.0.0 (Gurobi Optimization, LLC 2019) as the LP solver. |
| Experiment Setup | Yes | Since the LP solves are relatively expensive, we only recompute the LP solution every 100 playthroughs sampled. This does not change the asymptotic performance of the algorithm. To be consistent with the other algorithms, one iteration of MCCFR consists of one accepted loss vector per player. For the other algorithms, one iteration is one playthrough. |