Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium
Authors: Gabriele Farina, Tuomas Sandholm
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the ļ¬rst to beat top human specialist professionals in heads-up no-limit Texas hold em poker. We show that one-sided QPE can be computed more efļ¬ciently than all known prior reļ¬nements, paving the way to wider adoption of Nash equilibrium reļ¬nements in settings with perfectly rational machines (or humans perfectly actuating machinegenerated strategies) that interact with players prone to mistakes. We also show that one-sided QPE tends to play better than a Nash equilibrium strategy against imperfect opponents. We compare one-sided QPEs against EFPE and (Miltersen-Sorensen) QPE, along two metrics: 1) the time required to compute the reļ¬nement, and 2) how the reļ¬nement fares against imperfect opponents, when compared to an exact but potentially unreļ¬ned Nash equilibrium computed by the two state-of-the-art linear programming solvers CPLEX and Gurobi. We implemented from scratch the algorithm by Farina et al. [7] to solve the trembling linear programs corresponding to the three equilibrium reļ¬nements. |
| Researcher Affiliation | Collaboration | Gabriele Farina Computer Science Department Carnegie Mellon University EMAIL Tuomas Sandholm Computer Science Department, CMU Strategy Robot, Inc. Strategic Machine, Inc. Optimized Markets, Inc. EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links to source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | We present extensive experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the ļ¬rst to beat top human specialist professionals in heads-up no-limit Texas hold em poker. six standard benchmark games: three instances of Leduc poker [29] of increasing size, one relatively large Goofspiel game [27], Liar s Dice, and one real river endgame from the Brains vs AI competition that was played by the Libratus AI. A description of each game is available in the appendix. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files). It mentions 'CFR for 10000 iterations' to generate imperfect opponents, but this is a method for generating opponent strategies, not a standard dataset split. |
| Hardware Specification | Yes | Our implementation... was run on a machine with 32GB of RAM and an Intel processor running at a nominal speed of 2.4GHz per core. |
| Software Dependencies | No | We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the ļ¬rst iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modiļ¬ed so as to use 1000-digit precision ļ¬oating point numbers via GNU s MPFR library. From there onward... the value of ϵ is decreased by a factor 1000 and solved again with our modiļ¬ed version of GLOP... The basis stability oracle is implemented using rational precision... We use the GNU s GMP library to implement rational arithmetic. The paper mentions several software components (CPLEX, Gurobi, GLOP, MPFR, GMP) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the ļ¬rst iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modiļ¬ed so as to use 1000-digit precision ļ¬oating point numbers via GNU s MPFR library. From there onward, after every unsuccessful iteration of the algorithm (that is, where the basis is not stable), the value of ϵ is decreased by a factor 1000 and solved again with our modiļ¬ed version of GLOP, until a stable basis is found. In the river endgame we implemented the sparsiļ¬cation technique described in Zhang and Sandholm [33] to bring down the number of nonzeros of the payoff matrix from 21 million to roughly 167 thousand combined nonzeros in the sparsiļ¬cation when solving the linear program at each iteration. |