A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first conduct numerical simulations to investigate the impact of choosing different τ, which is used to define the softmax operator in Algorithms 1 and 2. Our theoretical results indicate that there is an asymptotically non-vanishing bias due to using a positive τ. Intuitively, since a softmax policy always has strictly positive entries while a Nash equilibrium policy can have zero entries, we cannot, in general, expect the Nash gap to converge to zero. To demonstrate this phenomenon, consider the following example of a zero-sum matrix game. Let [...] be the payoff matrix for player 1, and let R2 = (R1) , where N > 0 is a tunable parameter. Note that this matrix game has a unique Nash equilibrium, which goes to the joint policy π1 = (1/3, 2/3, 0), π2 = (0, 2/3, 1/3) as N . In our simulations, we use constant stepsizes αk 0, 5 and βk 0.01 and run Algorithm 1 for 100 trajectories (each has K = 2000 iterations). Then, we plot the average Nash gap (averaged over the 100 trajectories) as a function of the number of iterations k in Figure 1 for different temperatures τ. |
| Researcher Affiliation | Academia | 1CMS, Caltech, *zchen458@caltech.edu, mazumdar@caltech.edu, adamw@caltech.edu 2ECE & ISR, University of Maryland, College Park, kaiqing@umd.edu 3EECS, MIT, asuman@mit.edu |
| Pseudocode | Yes | Algorithm 1 Independent Learning Dynamics in Zero-Sum Matrix Games... Algorithm 2 Value Iteration with Smoothed Best-Response (VI-SBR) Dynamics |
| Open Source Code | No | The paper does not provide any statement or link for open-source code for the methodology described. |
| Open Datasets | No | The paper describes using specific game environments (e.g., "rock-paper-scissor game") and custom matrix game setups for numerical simulations, but it does not refer to or provide access information for any publicly available or open datasets in the traditional sense. |
| Dataset Splits | No | The paper describes simulation setups in game environments but does not provide specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper describes running numerical simulations and experiments but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used. |
| Software Dependencies | No | The paper mentions running simulations but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | In our simulations, we use constant stepsizes αk 0, 5 and βk 0.01 and run Algorithm 1 for 100 trajectories (each has K = 2000 iterations). |