When are Offline Two-Player Zero-Sum Markov Games Solvable?
Authors: Qiwen Cui, Simon S. Du
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. |
| Researcher Affiliation | Academia | Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Pessimistic Nash Value Iteration (PNVI) |
| Open Source Code | No | The paper does not provide any specific links or statements about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and discusses dataset assumptions in a general sense, but it does not specify or use any particular publicly available or open dataset for training experiments. |
| Dataset Splits | No | The paper is theoretical and does not include empirical experiments, thus it does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide details on experimental setup, hyperparameters, or training configurations. |