Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium

Authors: Gabriele Farina, Tuomas Sandholm

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the first to beat top human specialist professionals in heads-up no-limit Texas hold em poker. We show that one-sided QPE can be computed more efficiently than all known prior refinements, paving the way to wider adoption of Nash equilibrium refinements in settings with perfectly rational machines (or humans perfectly actuating machinegenerated strategies) that interact with players prone to mistakes. We also show that one-sided QPE tends to play better than a Nash equilibrium strategy against imperfect opponents. We compare one-sided QPEs against EFPE and (Miltersen-Sorensen) QPE, along two metrics: 1) the time required to compute the refinement, and 2) how the refinement fares against imperfect opponents, when compared to an exact but potentially unrefined Nash equilibrium computed by the two state-of-the-art linear programming solvers CPLEX and Gurobi. We implemented from scratch the algorithm by Farina et al. [7] to solve the trembling linear programs corresponding to the three equilibrium refinements.
Researcher Affiliation Collaboration Gabriele Farina Computer Science Department Carnegie Mellon University gfarina@cs.cmu.edu Tuomas Sandholm Computer Science Department, CMU Strategy Robot, Inc. Strategic Machine, Inc. Optimized Markets, Inc. sandholm@cs.cmu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific links to source code or explicitly state that the code is publicly available.
Open Datasets Yes We present extensive experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the first to beat top human specialist professionals in heads-up no-limit Texas hold em poker. six standard benchmark games: three instances of Leduc poker [29] of increasing size, one relatively large Goofspiel game [27], Liar s Dice, and one real river endgame from the Brains vs AI competition that was played by the Libratus AI. A description of each game is available in the appendix.
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files). It mentions 'CFR for 10000 iterations' to generate imperfect opponents, but this is a method for generating opponent strategies, not a standard dataset split.
Hardware Specification Yes Our implementation... was run on a machine with 32GB of RAM and an Intel processor running at a nominal speed of 2.4GHz per core.
Software Dependencies No We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the first iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modified so as to use 1000-digit precision floating point numbers via GNU s MPFR library. From there onward... the value of ϵ is decreased by a factor 1000 and solved again with our modified version of GLOP... The basis stability oracle is implemented using rational precision... We use the GNU s GMP library to implement rational arithmetic. The paper mentions several software components (CPLEX, Gurobi, GLOP, MPFR, GMP) but does not provide specific version numbers for any of them.
Experiment Setup Yes We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the first iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modified so as to use 1000-digit precision floating point numbers via GNU s MPFR library. From there onward, after every unsuccessful iteration of the algorithm (that is, where the basis is not stable), the value of ϵ is decreased by a factor 1000 and solved again with our modified version of GLOP, until a stable basis is found. In the river endgame we implemented the sparsification technique described in Zhang and Sandholm [33] to bring down the number of nonzeros of the payoff matrix from 21 million to roughly 167 thousand combined nonzeros in the sparsification when solving the linear program at each iteration.