Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we propose a model-free stage-based algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the H dependence as model-based algorithms. Sample complexity bound. We show that our algorithm provably finds an ϵ-optimal Nash equilibrium for the twoplayer zero-sum Markov game in e O(H3SAB/ϵ2) episodes, which improves upon the sample complexity of all existing model-free algorithms for zero-sum Markov game. Our main technical development lies in establishing a few new properties on the cumulative occurrence of the large V-gap and the cumulative bonus term, which enable the upper-bounding of several new error terms arising due to the incorporation of the new min-gap based reference-advantage decomposition technique.
Researcher Affiliation Academia 1The University of Florida 2Princeton University 3The University of California, Santa Barbara 4The Pennsylvania State University 5The Ohio State University.
Pseudocode Yes Algorithm 1 Q-learning with min-gap based reference-advantage decomposition (Algorithm 3 sketch), Algorithm 2 Certified policy µout (max-player version), and Algorithm 3 Q-learning with min-gap based reference-advantage decomposition
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the methodology described.
Open Datasets No This is a theoretical paper and does not mention the use of any dataset for training, nor does it provide access information for any dataset.
Dataset Splits No This is a theoretical paper and does not describe experimental validation or dataset splits for reproduction.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.