Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
Authors: Han Zhong, Wei Xiong, Jiyuan Tan, Liwei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI)... Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which shows our upper bound is nearly minimax optimal, which suggests that the datadependent term is intrinsic. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation. |
| Researcher Affiliation | Collaboration | 1Center for Data Science, Peking University 2The Hong Kong University of Science and Technology 3Fudan University 4Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University 5Google Research 6Northwestern University 7Yale University. |
| Pseudocode | Yes | Algorithm 1: Pessimistic Minimax Value Iteration |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding the availability of open-source code. |
| Open Datasets | No | The paper describes a theoretical "Offline Data Collecting Process" and uses a conceptual "dataset D" for proofs, but does not refer to a specific publicly available dataset used for training actual models. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the hardware used for computations. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers for implementation. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs, thus it does not include details on experimental setup like hyperparameters or training configurations. |