Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Sample-Efficient Multi-Agent RL: An Optimization Perspective
Authors: Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm only requires an equilibrium-solving oracle and an oracle that solves regularized supervised learning, and thus avoids solving constrained optimization problems within data-dependent constraints (Jin et al., 2020a; Wang et al., 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al., 2023). Moreover, the model-free version of our algorithms is the first provably efficient model-free algorithm for learning Nash equilibrium of general-sum MGs. |
| Researcher Affiliation | Academia | Nuoya Xiong IIIS, Tsinghua University EMAIL Zhihan Liu Northwestern University EMAIL Zhaoran Wang Northwestern University EMAIL Zhuoran Yang Yale University EMAIL |
| Pseudocode | Yes | Algorithm 1 Multi-Agent Maximize-to-EXplore (MAMEX) |
| Open Source Code | No | The paper does not provide any statements or links indicating that open-source code for the described methodology is available. |
| Open Datasets | No | The paper is theoretical and describes data collection for online learning ('data collected via online interactions') but does not specify or provide access to any particular public or open dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not perform empirical experiments, thus no dataset splits (training, validation, test) are specified for reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | No | The paper is theoretical and discusses algorithmic parameters like 'η' but does not provide details of an experimental setup such as hyperparameters or system-level training settings for empirical validation. |