reproducibilityindex.ai

When are Offline Two-Player Zero-Sum Markov Games Solvable?

Authors: Qiwen Cui, Simon S. Du

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study what dataset assumption permits solving ofﬂine two-player zero-sum Markov games. In stark contrast to the ofﬂine single-agent Markov decision process, we show that the single strategy concentration assumption is insufﬁcient for learning the Nash equilibrium (NE) strategy in ofﬂine two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efﬁcient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modiﬁcation for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games.
Researcher Affiliation	Academia	Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Pessimistic Nash Value Iteration (PNVI)
Open Source Code	No	The paper does not provide any specific links or statements about the availability of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and discusses dataset assumptions in a general sense, but it does not specify or use any particular publicly available or open dataset for training experiments.
Dataset Splits	No	The paper is theoretical and does not include empirical experiments, thus it does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not provide details on experimental setup, hyperparameters, or training configurations.