Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games

Authors: Lucas Baudin, Rida Laraki

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model called discounted stochastic game (Shapley, 1953). Our family of discrete-time FP procedures is proven to converge to the set of stationary Nash equilibria in identical interest discounted stochastic games. This extends similar convergence results for static games (Monderer & Shapley, 1996a). We then analyze the continuous-time counterpart of our FP procedures, which include as a particular case the best-response dynamic introduced and studied by Leslie et al. (2020) in the context of zero-sum stochastic games. We prove the converge of this dynamics to stationary Nash equilibria in identical-interest and zero-sum discounted stochastic games. Thanks to stochastic approximations, we can infer from the continuous-time convergence some discrete time results such as the convergence to stationary equilibria in zero sum and team stochastic games (Holler, 2020). ... Contributions. Our contributions are as follows: We define procedures to play stochastic games in discrete time combining ideas from fictitious play and Q-learning. We prove their convergence to the set of stationary Nash equilibria in identical interest stochastic games. The proof is given directly in discrete time. We define the continuous-time counterpart of our procedures. ... We prove the convergence of our continuous-time dynamics to the set of stationary equilibria in identical interest and in zero-sum stochastic games. The convergence in continuous time allows us to prove some new convergence results in discrete time for zero-sum and team stochastic games (Holler, 2020).
Researcher Affiliation Academia 1Universit e Paris-Dauphine PSL, France 2University of Liverpool, United Kingdom. Correspondence to: Lucas Baudin <lucas.baudin@dauphine.eu>.
Pseudocode Yes Algorithm 1 FP with Doubling Trick for Zero-Sum Games
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets No This is a theoretical paper focused on proving convergence of algorithms in game theory; it does not involve the use of empirical datasets for training.
Dataset Splits No This is a theoretical paper focused on proving convergence of algorithms in game theory; it does not involve data splits for validation or training.
Hardware Specification No The paper is theoretical and focuses on mathematical proofs and algorithm design; it does not describe any experimental setup that would require specific hardware specifications.
Software Dependencies No The paper is theoretical, focusing on mathematical concepts and algorithm proofs. It does not mention any specific software dependencies with version numbers for implementation or experimental purposes.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and algorithm design; it does not describe an experimental setup with hyperparameters or system-level training settings.