Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games

Authors: Adrian Rivera Cardoso, Jacob Abernethy, He Wang, Huan Xu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of repeated play in a zerosum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a fixed game matrix this can be easily reduced to solving a linear program. But when the payoff matrix evolves over time our goal is to find a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix. We design an algorithm with small NE regret that is, we ensure that the long-term payoff of both players is close to minimax optimum in hindsight. Our algorithm achieves near-optimal dependence with respect to the number of rounds and depends poly-logarithmically on the number of available actions of the players. Additionally, we show that the naive reduction, where each player simply minimizes its own regret, fails to achieve the stated objective regardless of which algorithm is used. Lastly, we consider the so-called bandit setting, where the feedback is significantly limited, and we provide an algorithm with small NE regret using one-point estimates of each payoff matrix.
Researcher Affiliation Academia 1Department of Industrial and Systems Engineering, Georgia Institute of Technology, GA, USA 2Department of Computer Science, Georgia Institute of Technology, GA, USA.
Pseudocode Yes Algorithm 1 Saddle-Point Regularized-Follow-the-Leader (SP-RFTL); Algorithm 2 Online-Matrix-Games Regularized-Follow-the-Regularized-Leader (OMG-RFTL); Algorithm 3 Bandit Online-Matrix-Games Regularized Follow-the-Leader (BANDIT-OMG-RFTL)
Open Source Code No The paper does not include any explicit statements or links about providing open-source code for the methodology described.
Open Datasets No The paper is theoretical and does not describe empirical experiments involving datasets for training.
Dataset Splits No The paper is theoretical and does not describe empirical experiments involving dataset splits for validation.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not mention any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe empirical experiments with specific setup details like hyperparameters or training settings.