reproducibilityindex.ai

Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium

Authors: Gabriele Farina, Tuomas Sandholm

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the ﬁrst to beat top human specialist professionals in heads-up no-limit Texas hold em poker. We show that one-sided QPE can be computed more efﬁciently than all known prior reﬁnements, paving the way to wider adoption of Nash equilibrium reﬁnements in settings with perfectly rational machines (or humans perfectly actuating machinegenerated strategies) that interact with players prone to mistakes. We also show that one-sided QPE tends to play better than a Nash equilibrium strategy against imperfect opponents. We compare one-sided QPEs against EFPE and (Miltersen-Sorensen) QPE, along two metrics: 1) the time required to compute the reﬁnement, and 2) how the reﬁnement fares against imperfect opponents, when compared to an exact but potentially unreﬁned Nash equilibrium computed by the two state-of-the-art linear programming solvers CPLEX and Gurobi. We implemented from scratch the algorithm by Farina et al. [7] to solve the trembling linear programs corresponding to the three equilibrium reﬁnements.
Researcher Affiliation	Collaboration	Gabriele Farina Computer Science Department Carnegie Mellon University gfarina@cs.cmu.edu Tuomas Sandholm Computer Science Department, CMU Strategy Robot, Inc. Strategic Machine, Inc. Optimized Markets, Inc. sandholm@cs.cmu.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific links to source code or explicitly state that the code is publicly available.
Open Datasets	Yes	We present extensive experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the ﬁrst to beat top human specialist professionals in heads-up no-limit Texas hold em poker. six standard benchmark games: three instances of Leduc poker [29] of increasing size, one relatively large Goofspiel game [27], Liar s Dice, and one real river endgame from the Brains vs AI competition that was played by the Libratus AI. A description of each game is available in the appendix.
Dataset Splits	No	The paper does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split files). It mentions 'CFR for 10000 iterations' to generate imperfect opponents, but this is a method for generating opponent strategies, not a standard dataset split.
Hardware Specification	Yes	Our implementation... was run on a machine with 32GB of RAM and an Intel processor running at a nominal speed of 2.4GHz per core.
Software Dependencies	No	We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the ﬁrst iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modiﬁed so as to use 1000-digit precision ﬂoating point numbers via GNU s MPFR library. From there onward... the value of ϵ is decreased by a factor 1000 and solved again with our modiﬁed version of GLOP... The basis stability oracle is implemented using rational precision... We use the GNU s GMP library to implement rational arithmetic. The paper mentions several software components (CPLEX, Gurobi, GLOP, MPFR, GMP) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	We start from the value ϵ = 10 6 and use Gurobi to solve the linear program. After the ﬁrst iteration, if the basis is not stable, we re-solve the linear program, again for ϵ = 10 6 using Google s open-source linear programming solver (GLOP), which we modiﬁed so as to use 1000-digit precision ﬂoating point numbers via GNU s MPFR library. From there onward, after every unsuccessful iteration of the algorithm (that is, where the basis is not stable), the value of ϵ is decreased by a factor 1000 and solved again with our modiﬁed version of GLOP, until a stable basis is found. In the river endgame we implemented the sparsiﬁcation technique described in Zhang and Sandholm [33] to bring down the number of nonzeros of the payoff matrix from 21 million to roughly 167 thousand combined nonzeros in the sparsiﬁcation when solving the linear program at each iteration.