Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games
Authors: Adrian Rivera Cardoso, Jacob Abernethy, He Wang, Huan Xu
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of repeated play in a zerosum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a fixed game matrix this can be easily reduced to solving a linear program. But when the payoff matrix evolves over time our goal is to find a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix. We design an algorithm with small NE regret that is, we ensure that the long-term payoff of both players is close to minimax optimum in hindsight. Our algorithm achieves near-optimal dependence with respect to the number of rounds and depends poly-logarithmically on the number of available actions of the players. Additionally, we show that the naive reduction, where each player simply minimizes its own regret, fails to achieve the stated objective regardless of which algorithm is used. Lastly, we consider the so-called bandit setting, where the feedback is significantly limited, and we provide an algorithm with small NE regret using one-point estimates of each payoff matrix. |
| Researcher Affiliation | Academia | 1Department of Industrial and Systems Engineering, Georgia Institute of Technology, GA, USA 2Department of Computer Science, Georgia Institute of Technology, GA, USA. |
| Pseudocode | Yes | Algorithm 1 Saddle-Point Regularized-Follow-the-Leader (SP-RFTL); Algorithm 2 Online-Matrix-Games Regularized-Follow-the-Regularized-Leader (OMG-RFTL); Algorithm 3 Bandit Online-Matrix-Games Regularized Follow-the-Leader (BANDIT-OMG-RFTL) |
| Open Source Code | No | The paper does not include any explicit statements or links about providing open-source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not describe empirical experiments involving datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments involving dataset splits for validation. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments with specific setup details like hyperparameters or training settings. |