reproducibilityindex.ai

Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games

Authors: Lucas Baudin, Rida Laraki

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model called discounted stochastic game (Shapley, 1953). Our family of discrete-time FP procedures is proven to converge to the set of stationary Nash equilibria in identical interest discounted stochastic games. This extends similar convergence results for static games (Monderer & Shapley, 1996a). We then analyze the continuous-time counterpart of our FP procedures, which include as a particular case the best-response dynamic introduced and studied by Leslie et al. (2020) in the context of zero-sum stochastic games. We prove the converge of this dynamics to stationary Nash equilibria in identical-interest and zero-sum discounted stochastic games. Thanks to stochastic approximations, we can infer from the continuous-time convergence some discrete time results such as the convergence to stationary equilibria in zero sum and team stochastic games (Holler, 2020). ... Contributions. Our contributions are as follows: We define procedures to play stochastic games in discrete time combining ideas from fictitious play and Q-learning. We prove their convergence to the set of stationary Nash equilibria in identical interest stochastic games. The proof is given directly in discrete time. We define the continuous-time counterpart of our procedures. ... We prove the convergence of our continuous-time dynamics to the set of stationary equilibria in identical interest and in zero-sum stochastic games. The convergence in continuous time allows us to prove some new convergence results in discrete time for zero-sum and team stochastic games (Holler, 2020).
Researcher Affiliation	Academia	1Universit e Paris-Dauphine PSL, France 2University of Liverpool, United Kingdom. Correspondence to: Lucas Baudin <lucas.baudin@dauphine.eu>.
Pseudocode	Yes	Algorithm 1 FP with Doubling Trick for Zero-Sum Games
Open Source Code	No	The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets	No	This is a theoretical paper focused on proving convergence of algorithms in game theory; it does not involve the use of empirical datasets for training.
Dataset Splits	No	This is a theoretical paper focused on proving convergence of algorithms in game theory; it does not involve data splits for validation or training.
Hardware Specification	No	The paper is theoretical and focuses on mathematical proofs and algorithm design; it does not describe any experimental setup that would require specific hardware specifications.
Software Dependencies	No	The paper is theoretical, focusing on mathematical concepts and algorithm proofs. It does not mention any specific software dependencies with version numbers for implementation or experimental purposes.
Experiment Setup	No	The paper is theoretical and focuses on mathematical proofs and algorithm design; it does not describe an experimental setup with hyperparameters or system-level training settings.