Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Authors: Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.
Researcher Affiliation Academia Yiheng Lin CMS, Caltech yihengl@caltech.edu Guannan Qu CMS, Caltech gqu@caltech.edu Longbo Huang IIIS, Tsinghua University longbohuang@tsinghua.edu.cn Adam Wierman CMS, Caltech adamw@caltech.edu
Pseudocode Yes Algorithm 1 Scalable Actor Critic
Open Source Code No The paper does not provide any specific links or explicit statements about the release of source code for the described methodology.
Open Datasets No The paper focuses on theoretical analysis and algorithm design rather than empirical evaluation with datasets. Therefore, it does not mention specific datasets or their public availability for training.
Dataset Splits No The paper is theoretical and does not involve empirical data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper focuses on theoretical analysis and algorithm design, and thus does not list specific software dependencies with version numbers for experimental setup.
Experiment Setup No The paper is theoretical, presenting an algorithm and its convergence properties, rather than detailing an empirical experimental setup with hyperparameters or system-level training settings.