Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
Authors: Laixi Shi, Eric Mazumdar, Yuejie Chi, Adam Wierman
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DR-NVI with respect to problemdependent factors such as the size of the state space, the target accuracy, and the horizon length. |
| Researcher Affiliation | Academia | Laixi Shi 1 Eric Mazumdar 1 Yuejie Chi 2 Adam Wierman 1 1Department of Computing Mathematical Sciences, California Institute of Technology, CA 91125, USA. 2Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA. |
| Pseudocode | Yes | Algorithm 1 Distributionally robust equilibrium value iteration (DR-NVI). |
| Open Source Code | No | The paper does not mention providing any open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments with specific datasets. It mentions assuming access to a "generative model" for theoretical sampling, but this is not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments involving dataset splits (training, validation, or test data). |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup, hyperparameters, or training settings for practical implementation or experiments. |