Can We Find Nash Equilibria at a Linear Rate in Markov Games?

Authors: Zhuoqing Song, Jason D. Lee, Zhuoran Yang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. First, the player does not need to know the policy of the opponent to update its policy. Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game. To this end, we construct a meta algorithm, dubbed as Homotopy-PO, which provably finds a Nash equilibrium at a global linear rate. In particular, Homotopy-PO interweaves two base algorithms Local-Fast and Global-Slow via homotopy continuation. Local-Fast is an algorithm that enjoys local linear convergence while Global-Slow is an algorithm that converges globally but at a slower sublinear rate. By switching between these two base algorithms, Global-Slow essentially serves as a guide which identifies a benign neighborhood where Local-Fast enjoys fast convergence. However, since the exact size of such a neighborhood is unknown, we apply a doubling trick to switch between these two base algorithms. The switching scheme is delicately designed so that the aggregated performance of the algorithm is driven by Local-Fast. Furthermore, we prove that Local-Fast and Global-Slow can both be instantiated by variants of optimistic gradient descent/ascent (OGDA) method, which is of independent interest.
Researcher Affiliation Academia Zhuoqing Song Fudan University zqsong19@fudan.edu.cn Jason D. Lee Princeton University jasonlee@princeton.edu Zhuoran Yang Yale University zhuoran.yang@yale.edu
Pseudocode Yes Algorithm 1: Homotopy-PO: a meta-algorithm with global linear convergence
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper mentions generating a sequence of zero-sum Markov games randomly and independently but does not provide access information (link, citation, or repository) to make this process publicly available or to refer to a standard public dataset.
Dataset Splits No The paper describes how the Markov game model is generated (randomly and independently, S=10, A=B=10, gamma=0.99) and how initial policies are generated. However, it does not specify any training/validation/test splits for the data. The data is generated on the fly for numerical experiments.
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes In all the experiments below, we set the stepsizes η = 0.1 in OGDA and also η = 0.1 in Averaging OGDA. We find our algorithm has linear convergence in all the experiments with these stepsizes.