Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
Authors: Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon Shaolei Du
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our algorithms can achieve e O 1/4T 3/4 regret when the degree of nonstationarity, as measured by total variation , is known, and e O 1/5T 4/5 regret when is unknown, where T is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria. |
| Researcher Affiliation | Academia | Haozhe Jiang1 Qiwen Cui2 Zhihan Xiong2 Maryam Fazel2 Simon S. Du2 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 University of Washington |
| Pseudocode | Yes | Algorithm 1 Restarted Explore-then-Commit for Non-stationary MARL Algorithm 2 Multi-scale Testing for Non-stationary MARL Protocol 1 TEST_EQ Protocol 2 Scheduling TEST_EQ in a block with length 2n |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical studies that would involve training on specific datasets. It discusses theoretical bounds and algorithms. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical studies that would involve dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper focuses on theoretical contributions and algorithm design; it does not report on empirical experiments requiring specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not detail specific software dependencies with version numbers required to reproduce experiments. |
| Experiment Setup | No | The paper is theoretical and does not conduct empirical experiments, thus no details regarding hyperparameters, training configurations, or system-level settings for experiments are provided. |