Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Authors: Brian Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen McAleer, Andreas Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning.
Researcher Affiliation Collaboration Brian Hu Zhang Carnegie Mellon University bhzhang@cs.cmu.edu Gabriele Farina MIT gfarina@mit.edu Ioannis Anagnostides Carnegie Mellon University ianagnos@cs.cmu.edu Federico Cacciamani DEIB, Politecnico di Milano federico.cacciamani@polimi.it Stephen Mc Aleer Carnegie Mellon University smcaleer@cs.cmu.edu Andreas Haupt MIT haupt@mit.edu Andrea Celli Bocconi University andrea.celli2@unibocconi.it Nicola Gatti DEIB, Politecnico di Milano nicola.gatti@polimi.it Vincent Conitzer Carnegie Mellon University conitzer@cs.cmu.edu Tuomas Sandholm Carnegie Mellon University Strategic Machine, Inc. Strategy Robot, Inc. Optimized Markets, Inc. sandholm@cs.cmu.edu
Pseudocode Yes ALGORITHM 1: Pseudocode for binary search-based algorithm; ALGORITHM 2: Regret minimization over a conic hull
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The game instances we use are described in detail in Appendix F, and belong to following eight different classes of established parametric benchmark games, each identified with an alphabetical mnemonic: B Battleship [30], D Liar s dice [68], GL Goofspiel [88], K Kuhn poker [61], L Leduc poker [93], RS ridesharing game [101], S Sheriff [30], TP double dummy bridge game [101].
Dataset Splits No The paper does not specify traditional dataset splits (e.g., 70% training, 15% validation, 15% test) as it primarily deals with game environments and learning dynamics rather than static datasets.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions algorithms/frameworks like 'CFR+' [95], 'PSRO algorithm [63]', and 'proximal policy optimization (PPO) [90]' but does not specify exact version numbers for these or any other software dependencies.
Experiment Setup Yes We used CFR+ [95] as learning algorithm and a fixed Lagrange multiplier λ := 25 to compute the optimal communication equilibrium that corresponds to the optimal mechanism. We terminated the learning procedure after 10000 iterations, at a duality gap for (L1) of approximately 4.2 10 4. We use online optimistic gradient descent to update the Lagrange multipliers of each player... In the table, we report the best runtime across all choices of the stepsize hyperparameter η {0.01, 0.1, 1.0, 10.0} used in online optimistic gradient descent to update the Lagrange multipliers.