A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
Authors: Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon Shaolei Du
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our algorithms can achieve e O 1/4T 3/4 regret when the degree of nonstationarity, as measured by total variation , is known, and e O 1/5T 4/5 regret when is unknown, where T is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria. |
| Researcher Affiliation | Academia | Haozhe Jiang1 Qiwen Cui2 Zhihan Xiong2 Maryam Fazel2 Simon S. Du2 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 University of Washington |
| Pseudocode | Yes | Algorithm 1 Restarted Explore-then-Commit for Non-stationary MARL Algorithm 2 Multi-scale Testing for Non-stationary MARL Protocol 1 TEST_EQ Protocol 2 Scheduling TEST_EQ in a block with length 2n |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical studies that would involve training on specific datasets. It discusses theoretical bounds and algorithms. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical studies that would involve dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper focuses on theoretical contributions and algorithm design; it does not report on empirical experiments requiring specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not detail specific software dependencies with version numbers required to reproduce experiments. |
| Experiment Setup | No | The paper is theoretical and does not conduct empirical experiments, thus no details regarding hyperparameters, training configurations, or system-level settings for experiments are provided. |