Offline Learning in Markov Games with General Function Approximation
Authors: Yuheng Zhang, Yu Bai, Nan Jiang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium such as Nash equilibrium and (Coarse) Correlated Equilibrium from an offline dataset precollected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. By using Bellman-consistent pessimism, we obtain interval estimation for policies returns, and use both the upper and the lower bounds to obtain a relaxation on the gap of a candidate policy, which becomes our optimization objective. Our results generalize prior works and provide several additional insights. Importantly, we require a data coverage condition that improves over the recently proposed unilateral concentrability . Our condition allows selective coverage of deviation policies that optimally tradeoff between their greediness (as approximate best responses) and coverage, and we show scenarios where this leads to significantly better guarantees. As a new connection, we also show how our algorithmic framework can subsume seemingly different solution concepts designed for the special case of two-player zero-sum games. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign 2 Salesforce AI Research. Correspondence to: Nan Jiang <nanjiang@illinois.edu>. |
| Pseudocode | Yes | Algorithm 1 Bellman-Consistent Equilibrium Learning (BCEL) from an Offline Dataset |
| Open Source Code | No | The paper does not provide any statements about open-sourcing code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not use or make available any specific public datasets for training. It discusses a 'pre-collected historical dataset' and a 'data distribution d D (S A)' as abstract concepts for its theoretical framework. |
| Dataset Splits | No | The paper is theoretical and does not specify training/test/validation dataset splits, as it does not conduct experiments on real datasets. |
| Hardware Specification | No | The paper focuses on theoretical analysis and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software or libraries with version numbers, as it does not report on empirical implementations or experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experimental setup details such as hyperparameters or training configurations for empirical evaluation. |