Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games

Authors: Brian Hu Zhang, Tuomas Sandholm5779-5788

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally that, in the black-box setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree. We conducted experiments on two common benchmarks: (1) k-rank Goofspiel. (2) k-rank heads-up limit Leduc poker (Southey et al. 2005).
Researcher Affiliation Collaboration 1Computer Science Department, Carnegie Mellon University 2Strategic Machine, Inc. 3Strategy Robot, Inc. 4Optimized Markets, Inc.
Pseudocode Yes Algorithm 6.1 Two-player zero-sum certificate finding; Algorithm 7.2 Certificate-finding with regret minimization
Open Source Code No The paper does not contain any explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We conducted experiments on two common benchmarks: (1) k-rank Goofspiel. (2) k-rank heads-up limit Leduc poker (Southey et al. 2005)
Dataset Splits No The paper discusses 'playthroughs' and 'samples' from a black-box simulator but does not describe conventional training, validation, and test dataset splits with specific percentages or counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments, only mentioning the use of Gurobi for LP solving.
Software Dependencies Yes We use Gurobi v9.0.0 (Gurobi Optimization, LLC 2019) as the LP solver.
Experiment Setup Yes Since the LP solves are relatively expensive, we only recompute the LP solution every 100 playthroughs sampled. This does not change the asymptotic performance of the algorithm. To be consistent with the other algorithms, one iteration of MCCFR consists of one accepted loss vector per player. For the other algorithms, one iteration is one playthrough.