When are Offline Two-Player Zero-Sum Markov Games Solvable?

Authors: Qiwen Cui, Simon S. Du

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games.
Researcher Affiliation Academia Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu
Pseudocode Yes Algorithm 1 Pessimistic Nash Value Iteration (PNVI)
Open Source Code No The paper does not provide any specific links or statements about the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and discusses dataset assumptions in a general sense, but it does not specify or use any particular publicly available or open dataset for training experiments.
Dataset Splits No The paper is theoretical and does not include empirical experiments, thus it does not specify training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not provide details on experimental setup, hyperparameters, or training configurations.