A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Authors: Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon Shaolei Du

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our algorithms can achieve e O 1/4T 3/4 regret when the degree of nonstationarity, as measured by total variation , is known, and e O 1/5T 4/5 regret when is unknown, where T is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.
Researcher Affiliation Academia Haozhe Jiang1 Qiwen Cui2 Zhihan Xiong2 Maryam Fazel2 Simon S. Du2 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 University of Washington
Pseudocode Yes Algorithm 1 Restarted Explore-then-Commit for Non-stationary MARL Algorithm 2 Multi-scale Testing for Non-stationary MARL Protocol 1 TEST_EQ Protocol 2 Scheduling TEST_EQ in a block with length 2n
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not conduct empirical studies that would involve training on specific datasets. It discusses theoretical bounds and algorithms.
Dataset Splits No The paper is theoretical and does not conduct empirical studies that would involve dataset splits for training, validation, or testing.
Hardware Specification No The paper focuses on theoretical contributions and algorithm design; it does not report on empirical experiments requiring specific hardware specifications.
Software Dependencies No The paper is theoretical and does not detail specific software dependencies with version numbers required to reproduce experiments.
Experiment Setup No The paper is theoretical and does not conduct empirical experiments, thus no details regarding hyperparameters, training configurations, or system-level settings for experiments are provided.