Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games
Authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we propose a model-free stage-based algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the H dependence as model-based algorithms. Sample complexity bound. We show that our algorithm provably finds an ϵ-optimal Nash equilibrium for the twoplayer zero-sum Markov game in e O(H3SAB/ϵ2) episodes, which improves upon the sample complexity of all existing model-free algorithms for zero-sum Markov game. Our main technical development lies in establishing a few new properties on the cumulative occurrence of the large V-gap and the cumulative bonus term, which enable the upper-bounding of several new error terms arising due to the incorporation of the new min-gap based reference-advantage decomposition technique. |
| Researcher Affiliation | Academia | 1The University of Florida 2Princeton University 3The University of California, Santa Barbara 4The Pennsylvania State University 5The Ohio State University. |
| Pseudocode | Yes | Algorithm 1 Q-learning with min-gap based reference-advantage decomposition (Algorithm 3 sketch), Algorithm 2 Certified policy µout (max-player version), and Algorithm 3 Q-learning with min-gap based reference-advantage decomposition |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the methodology described. |
| Open Datasets | No | This is a theoretical paper and does not mention the use of any dataset for training, nor does it provide access information for any dataset. |
| Dataset Splits | No | This is a theoretical paper and does not describe experimental validation or dataset splits for reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations. |