reproducibilityindex.ai

Offline Learning in Markov Games with General Function Approximation

Authors: Yuheng Zhang, Yu Bai, Nan Jiang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium such as Nash equilibrium and (Coarse) Correlated Equilibrium from an offline dataset precollected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. By using Bellman-consistent pessimism, we obtain interval estimation for policies returns, and use both the upper and the lower bounds to obtain a relaxation on the gap of a candidate policy, which becomes our optimization objective. Our results generalize prior works and provide several additional insights. Importantly, we require a data coverage condition that improves over the recently proposed unilateral concentrability . Our condition allows selective coverage of deviation policies that optimally tradeoff between their greediness (as approximate best responses) and coverage, and we show scenarios where this leads to significantly better guarantees. As a new connection, we also show how our algorithmic framework can subsume seemingly different solution concepts designed for the special case of two-player zero-sum games.
Researcher Affiliation	Collaboration	1University of Illinois at Urbana-Champaign 2 Salesforce AI Research. Correspondence to: Nan Jiang <nanjiang@illinois.edu>.
Pseudocode	Yes	Algorithm 1 Bellman-Consistent Equilibrium Learning (BCEL) from an Offline Dataset
Open Source Code	No	The paper does not provide any statements about open-sourcing code or links to a code repository.
Open Datasets	No	The paper is theoretical and does not use or make available any specific public datasets for training. It discusses a 'pre-collected historical dataset' and a 'data distribution d D (S A)' as abstract concepts for its theoretical framework.
Dataset Splits	No	The paper is theoretical and does not specify training/test/validation dataset splits, as it does not conduct experiments on real datasets.
Hardware Specification	No	The paper focuses on theoretical analysis and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention specific software or libraries with version numbers, as it does not report on empirical implementations or experiments.
Experiment Setup	No	The paper is theoretical and does not describe specific experimental setup details such as hyperparameters or training configurations for empirical evaluation.