reproducibilityindex.ai

Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games

Authors: Adrian Rivera Cardoso, Jacob Abernethy, He Wang, Huan Xu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the problem of repeated play in a zerosum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a ﬁxed game matrix this can be easily reduced to solving a linear program. But when the payoff matrix evolves over time our goal is to ﬁnd a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix. We design an algorithm with small NE regret that is, we ensure that the long-term payoff of both players is close to minimax optimum in hindsight. Our algorithm achieves near-optimal dependence with respect to the number of rounds and depends poly-logarithmically on the number of available actions of the players. Additionally, we show that the naive reduction, where each player simply minimizes its own regret, fails to achieve the stated objective regardless of which algorithm is used. Lastly, we consider the so-called bandit setting, where the feedback is signiﬁcantly limited, and we provide an algorithm with small NE regret using one-point estimates of each payoff matrix.
Researcher Affiliation	Academia	1Department of Industrial and Systems Engineering, Georgia Institute of Technology, GA, USA 2Department of Computer Science, Georgia Institute of Technology, GA, USA.
Pseudocode	Yes	Algorithm 1 Saddle-Point Regularized-Follow-the-Leader (SP-RFTL); Algorithm 2 Online-Matrix-Games Regularized-Follow-the-Regularized-Leader (OMG-RFTL); Algorithm 3 Bandit Online-Matrix-Games Regularized Follow-the-Leader (BANDIT-OMG-RFTL)
Open Source Code	No	The paper does not include any explicit statements or links about providing open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not describe empirical experiments involving datasets for training.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments involving dataset splits for validation.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments with specific setup details like hyperparameters or training settings.