A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first conduct numerical simulations to investigate the impact of choosing different τ, which is used to define the softmax operator in Algorithms 1 and 2. Our theoretical results indicate that there is an asymptotically non-vanishing bias due to using a positive τ. Intuitively, since a softmax policy always has strictly positive entries while a Nash equilibrium policy can have zero entries, we cannot, in general, expect the Nash gap to converge to zero. To demonstrate this phenomenon, consider the following example of a zero-sum matrix game. Let [...] be the payoff matrix for player 1, and let R2 = (R1) , where N > 0 is a tunable parameter. Note that this matrix game has a unique Nash equilibrium, which goes to the joint policy π1 = (1/3, 2/3, 0), π2 = (0, 2/3, 1/3) as N . In our simulations, we use constant stepsizes αk 0, 5 and βk 0.01 and run Algorithm 1 for 100 trajectories (each has K = 2000 iterations). Then, we plot the average Nash gap (averaged over the 100 trajectories) as a function of the number of iterations k in Figure 1 for different temperatures τ.
Researcher Affiliation Academia 1CMS, Caltech, *zchen458@caltech.edu, mazumdar@caltech.edu, adamw@caltech.edu 2ECE & ISR, University of Maryland, College Park, kaiqing@umd.edu 3EECS, MIT, asuman@mit.edu
Pseudocode Yes Algorithm 1 Independent Learning Dynamics in Zero-Sum Matrix Games... Algorithm 2 Value Iteration with Smoothed Best-Response (VI-SBR) Dynamics
Open Source Code No The paper does not provide any statement or link for open-source code for the methodology described.
Open Datasets No The paper describes using specific game environments (e.g., "rock-paper-scissor game") and custom matrix game setups for numerical simulations, but it does not refer to or provide access information for any publicly available or open datasets in the traditional sense.
Dataset Splits No The paper describes simulation setups in game environments but does not provide specific training/test/validation dataset splits.
Hardware Specification No The paper describes running numerical simulations and experiments but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used.
Software Dependencies No The paper mentions running simulations but does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes In our simulations, we use constant stepsizes αk 0, 5 and βk 0.01 and run Algorithm 1 for 100 trajectories (each has K = 2000 iterations).