Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
Authors: Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We revisit the problem of learning in two-player zero-sum Markov games, focusing on developing an algorithm that is uncoupled, convergent, and rational, with nonasymptotic convergence rates to Nash equilibrium. We start from the case of stateless matrix game with bandit feedback as a warm-up, showing an O(t 1 8 ) last-iterate convergence rate. To the best of our knowledge, this is the first result that obtains finite last-iterate convergence rate given access to only bandit feedback. We extend our result to the case of irreducible Markov games, providing a lastiterate convergence rate of O(t 1 9+ε ) for any ε > 0. Finally, we study Markov games without any assumptions on the dynamics, and show a path convergence rate, a new notion of convergence we define, of O(t 1 10 ). Our algorithm removes the coordination and prior knowledge requirement of [WLZL21a], which pursued the same goals as us for irreducible Markov games. Our algorithm is related to [CMZ21, CWC21] and also builds on the entropy regularization technique. However, we remove their requirement of communications on the entropy values, making our algorithm entirely uncoupled. |
| Researcher Affiliation | Academia | Yang Cai Yale University EMAIL Haipeng Luo University of Southern California EMAIL Chen-Yu Wei University of Virginia EMAIL Weiqiang Zheng Yale University EMAIL |
| Pseudocode | Yes | Algorithm 1 Matrix Game with Bandit Feedback Algorithm 2 Irreducible Markov Game Algorithm 3 General Markov Game |
| Open Source Code | No | The paper does not contain any statements about making source code publicly available, nor does it provide any links to a code repository. |
| Open Datasets | No | This paper is theoretical and focuses on algorithm design and convergence rates. It does not describe experiments run on specific datasets, and therefore does not specify public dataset availability in that context. |
| Dataset Splits | No | This paper is theoretical and focuses on algorithm design and convergence rates. It does not describe experiments with training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical, focusing on algorithms and their convergence properties, and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and describes algorithms and proofs. It does not list specific software dependencies with version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper defines parameters for its theoretical algorithms (e.g., ηt = t kη, βt = t kβ, ϵt = t kϵ in Algorithm 1), which are part of the algorithm's mathematical definition rather than an experimental setup for empirical evaluation. |