Multi-Agent Reinforcement Learning in Stochastic Networked Systems
Authors: Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems. |
| Researcher Affiliation | Academia | Yiheng Lin CMS, Caltech yihengl@caltech.edu Guannan Qu CMS, Caltech gqu@caltech.edu Longbo Huang IIIS, Tsinghua University longbohuang@tsinghua.edu.cn Adam Wierman CMS, Caltech adamw@caltech.edu |
| Pseudocode | Yes | Algorithm 1 Scalable Actor Critic |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the release of source code for the described methodology. |
| Open Datasets | No | The paper focuses on theoretical analysis and algorithm design rather than empirical evaluation with datasets. Therefore, it does not mention specific datasets or their public availability for training. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper focuses on theoretical analysis and algorithm design, and thus does not list specific software dependencies with version numbers for experimental setup. |
| Experiment Setup | No | The paper is theoretical, presenting an algorithm and its convergence properties, rather than detailing an empirical experimental setup with hyperparameters or system-level training settings. |