Online Learning in Periodic Zero-Sum Games
Authors: Tanner Fiez, Ryann Sim, Stratis Skoulakis, Georgios Piliouras, Lillian Ratliff
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present several experimental simulations that illustrate our theoretical results. To begin, for continuous-time GDA dynamics we show that Poincaré recurrence holds in a periodic zero-sum bilinear game. We consider the ubiquitous Matching Pennies game with payoff matrix 1 1 1 1 . We then use the following periodic rescaling with period 2π: α(t) = sin(t) 0 t 3π π (t mod(2π) 2π) 3π When players follow the GDA learning dynamics, we see from Figure 1 that the trajectories when plotted alongside the value of the periodic rescaling are bounded. A similar experimental result holds in the case of FTRL dynamics. In the supplementary material, we simulate replicator dynamics with the same periodic rescaling as in Figure 1. The trajectories in the dual/payoff space also remain bounded due to the invariance of the Kullback-Leibler divergences (KL-divergence). Figure 1: Bounded trajectories for a periodically rescaled Matching Pennies game updated using GDA. Figure 2: Weighted sum of KL-divergences for a 64-player periodically rescaled Matching Pennies game. Note that despite the complicated trajectories of each player, the weighted sum of their divergences remains constant. Lemmas 2 and 4 describe functions Φ which remain time-invariant. In the case of replicator dynamics, Φ(t) is the sum of Kullback-Leibler divergences measured between the strategy of each player and their mixed Nash strategy [1/2, 1/2]. We simulated a 64-player polymatrix extension to the Matching Pennies game, where each agent plays against the opponent immediately adjacent to them, forming a toroid -like chain of games. Furthermore, we randomly rescale each game with a different periodic function. Figure 2 depicts the claim presented in the lemmas: although each agent s specific divergence term KL(x i xi(t)) fluctuates, the sum P i V KL(x i xi(t)) remains constant. To generate Figure 3, we show the data from a simplified 64-player polymatrix game simulation, where the graph that represents player interactions is sparse. Here, the strategy of each player informs the RGB value of a corresponding pixel on a grid. If the system exhibits Poincaré recurrence, we should eventually see similar patterns emerge as the pixels change color over time (i.e., as their corresponding mixed strategies evolve). As observed in Figure 3, the system returns near the initial image at time T = 6226. Further details about the experiments can be found in Appendix F. |
| Researcher Affiliation | Academia | Tanner Fiez University of Washington Seattle, Washington fiezt@uw.edu Ryann Sim SUTD Singapore ryann_sim@mymail.sutd.edu.sg Stratis Skoulakis SUTD Singapore efstratios@sutd.edu.sg Georgios Piliouras SUTD Singapore georgios@sutd.edu.sg Lillian Ratliff University of Washington Seattle, Washington ratliffl@uw.edu |
| Pseudocode | No | The paper describes mathematical dynamics using equations but does not include any explicitly labeled |
| Open Source Code | Yes | The code and instructions required to reproduce the results are given in several formats within the supplementary material. |
| Open Datasets | No | The paper conducts simulations of game theory models, specifically mentioning |
| Dataset Splits | No | The paper simulates dynamic systems based on game theory and does not involve machine learning training on external datasets with train/validation/test splits. The checklist states: |
| Hardware Specification | No | The paper states: |
| Software Dependencies | No | The paper states: |
| Experiment Setup | Yes | In this section, we present several experimental simulations that illustrate our theoretical results. To begin, for continuous-time GDA dynamics we show that Poincaré recurrence holds in a periodic zero-sum bilinear game. We consider the ubiquitous Matching Pennies game with payoff matrix 1 1 1 1 . We then use the following periodic rescaling with period 2π: α(t) = sin(t) 0 t 3π π (t mod(2π) 2π) 3π... We simulated a 64-player polymatrix extension to the Matching Pennies game, where each agent plays against the opponent immediately adjacent to them, forming a toroid -like chain of games. Furthermore, we randomly rescale each game with a different periodic function. |