Online Learning in Periodic Zero-Sum Games

Authors: Tanner Fiez, Ryann Sim, Stratis Skoulakis, Georgios Piliouras, Lillian Ratliff

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present several experimental simulations that illustrate our theoretical results. To begin, for continuous-time GDA dynamics we show that Poincaré recurrence holds in a periodic zero-sum bilinear game. We consider the ubiquitous Matching Pennies game with payoff matrix 1 1 1 1 . We then use the following periodic rescaling with period 2π: α(t) = sin(t) 0 t 3π π (t mod(2π) 2π) 3π When players follow the GDA learning dynamics, we see from Figure 1 that the trajectories when plotted alongside the value of the periodic rescaling are bounded. A similar experimental result holds in the case of FTRL dynamics. In the supplementary material, we simulate replicator dynamics with the same periodic rescaling as in Figure 1. The trajectories in the dual/payoff space also remain bounded due to the invariance of the Kullback-Leibler divergences (KL-divergence). Figure 1: Bounded trajectories for a periodically rescaled Matching Pennies game updated using GDA. Figure 2: Weighted sum of KL-divergences for a 64-player periodically rescaled Matching Pennies game. Note that despite the complicated trajectories of each player, the weighted sum of their divergences remains constant. Lemmas 2 and 4 describe functions Φ which remain time-invariant. In the case of replicator dynamics, Φ(t) is the sum of Kullback-Leibler divergences measured between the strategy of each player and their mixed Nash strategy [1/2, 1/2]. We simulated a 64-player polymatrix extension to the Matching Pennies game, where each agent plays against the opponent immediately adjacent to them, forming a toroid -like chain of games. Furthermore, we randomly rescale each game with a different periodic function. Figure 2 depicts the claim presented in the lemmas: although each agent s specific divergence term KL(x i xi(t)) fluctuates, the sum P i V KL(x i xi(t)) remains constant. To generate Figure 3, we show the data from a simplified 64-player polymatrix game simulation, where the graph that represents player interactions is sparse. Here, the strategy of each player informs the RGB value of a corresponding pixel on a grid. If the system exhibits Poincaré recurrence, we should eventually see similar patterns emerge as the pixels change color over time (i.e., as their corresponding mixed strategies evolve). As observed in Figure 3, the system returns near the initial image at time T = 6226. Further details about the experiments can be found in Appendix F.
Researcher Affiliation Academia Tanner Fiez University of Washington Seattle, Washington fiezt@uw.edu Ryann Sim SUTD Singapore ryann_sim@mymail.sutd.edu.sg Stratis Skoulakis SUTD Singapore efstratios@sutd.edu.sg Georgios Piliouras SUTD Singapore georgios@sutd.edu.sg Lillian Ratliff University of Washington Seattle, Washington ratliffl@uw.edu
Pseudocode No The paper describes mathematical dynamics using equations but does not include any explicitly labeled
Open Source Code Yes The code and instructions required to reproduce the results are given in several formats within the supplementary material.
Open Datasets No The paper conducts simulations of game theory models, specifically mentioning
Dataset Splits No The paper simulates dynamic systems based on game theory and does not involve machine learning training on external datasets with train/validation/test splits. The checklist states:
Hardware Specification No The paper states:
Software Dependencies No The paper states:
Experiment Setup Yes In this section, we present several experimental simulations that illustrate our theoretical results. To begin, for continuous-time GDA dynamics we show that Poincaré recurrence holds in a periodic zero-sum bilinear game. We consider the ubiquitous Matching Pennies game with payoff matrix 1 1 1 1 . We then use the following periodic rescaling with period 2π: α(t) = sin(t) 0 t 3π π (t mod(2π) 2π) 3π... We simulated a 64-player polymatrix extension to the Matching Pennies game, where each agent plays against the opponent immediately adjacent to them, forming a toroid -like chain of games. Furthermore, we randomly rescale each game with a different periodic function.