Sample-Efficient Multi-Agent RL: An Optimization Perspective
Authors: Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm only requires an equilibrium-solving oracle and an oracle that solves regularized supervised learning, and thus avoids solving constrained optimization problems within data-dependent constraints (Jin et al., 2020a; Wang et al., 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al., 2023). Moreover, the model-free version of our algorithms is the first provably efficient model-free algorithm for learning Nash equilibrium of general-sum MGs. |
| Researcher Affiliation | Academia | Nuoya Xiong IIIS, Tsinghua University xiongny20@mails.tsinghua.edu.cn Zhihan Liu Northwestern University zhihanliu2027@u.northwestern.edu Zhaoran Wang Northwestern University zhaoranwang@gmail.com Zhuoran Yang Yale University zhuoran.yang@yale.edu |
| Pseudocode | Yes | Algorithm 1 Multi-Agent Maximize-to-EXplore (MAMEX) |
| Open Source Code | No | The paper does not provide any statements or links indicating that open-source code for the described methodology is available. |
| Open Datasets | No | The paper is theoretical and describes data collection for online learning ('data collected via online interactions') but does not specify or provide access to any particular public or open dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not perform empirical experiments, thus no dataset splits (training, validation, test) are specified for reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | No | The paper is theoretical and discusses algorithmic parameters like 'η' but does not provide details of an experimental setup such as hyperparameters or system-level training settings for empirical validation. |