Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
When are Offline Two-Player Zero-Sum Markov Games Solvable?
Authors: Qiwen Cui, Simon S. Du
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. |
| Researcher Affiliation | Academia | Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington EMAIL Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington EMAIL |
| Pseudocode | Yes | Algorithm 1 Pessimistic Nash Value Iteration (PNVI) |
| Open Source Code | No | The paper does not provide any specific links or statements about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and discusses dataset assumptions in a general sense, but it does not specify or use any particular publicly available or open dataset for training experiments. |
| Dataset Splits | No | The paper is theoretical and does not include empirical experiments, thus it does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide details on experimental setup, hyperparameters, or training configurations. |