Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium
Authors: Gabriele Farina, Tuomas Sandholm
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the first to beat top human specialist professionals in heads-up no-limit Texas hold em poker. We show that one-sided QPE can be computed more efficiently than all known prior refinements, paving the way to wider adoption of Nash equilibrium refinements in settings with perfectly rational machines (or humans perfectly actuating machinegenerated strategies) that interact with players prone to mistakes. We also show that one-sided QPE tends to play better than a Nash equilibrium strategy against imperfect opponents. We compare one-sided QPEs against EFPE and (Miltersen-Sorensen) QPE, along two metrics: 1) the time required to compute the refinement, and 2) how the refinement fares against imperfect opponents, when compared to an exact but potentially unrefined Nash equilibrium computed by the two state-of-the-art linear programming solvers CPLEX and Gurobi. We implemented from scratch the algorithm by Farina et al. [7] to solve the trembling linear programs corresponding to the three equilibrium refinements. |
| Researcher Affiliation | Collaboration | Gabriele Farina Computer Science Department Carnegie Mellon University gfarina@cs.cmu.edu Tuomas Sandholm Computer Science Department, CMU Strategy Robot, Inc. Strategic Machine, Inc. Optimized Markets, Inc. sandholm@cs.cmu.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links to source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | We present extensive experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the first to beat top human specialist professionals in heads-up no-limit Texas hold em poker. six standard benchmark games: three instances of Leduc poker [29] of increasing size, one relatively large Goofspiel game [27], Liar s Dice, and one real river endgame from the Brains vs AI competition that was played by the Libratus AI. A description of each game is available in the appendix. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files). It mentions 'CFR for 10000 iterations' to generate imperfect opponents, but this is a method for generating opponent strategies, not a standard dataset split. |
| Hardware Specification | Yes | Our implementation... was run on a machine with 32GB of RAM and an Intel processor running at a nominal speed of 2.4GHz per core. |
| Software Dependencies | No | We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the first iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modified so as to use 1000-digit precision floating point numbers via GNU s MPFR library. From there onward... the value of ϵ is decreased by a factor 1000 and solved again with our modified version of GLOP... The basis stability oracle is implemented using rational precision... We use the GNU s GMP library to implement rational arithmetic. The paper mentions several software components (CPLEX, Gurobi, GLOP, MPFR, GMP) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the first iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modified so as to use 1000-digit precision floating point numbers via GNU s MPFR library. From there onward, after every unsuccessful iteration of the algorithm (that is, where the basis is not stable), the value of ϵ is decreased by a factor 1000 and solved again with our modified version of GLOP, until a stable basis is found. In the river endgame we implemented the sparsification technique described in Zhang and Sandholm [33] to bring down the number of nonzeros of the payoff matrix from 21 million to roughly 167 thousand combined nonzeros in the sparsification when solving the linear program at each iteration. |