Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
Authors: Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Basar
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first evaluate our algorithms with linear function approximation... It is shown in Figure 2 that the proposed algorithms successfully converge even with such nonlinear function approximators... |
| Researcher Affiliation | Collaboration | 1Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois at Urbana Champaign, USA 2Department of Operations Research and Financial Engineering, Princeton University, USA 3Department of Electrical Engineering and Computer Science and Statistics, Northwestern University, USA 4Tencent AI Lab, China. |
| Pseudocode | Yes | We refer to the steps (3.3)-(3.5) as Algorithm 1, whose pseudocode is provided in A in the appendix. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We consider the MARL task of Cooperative Navigation from Lowe et al. (2017). To be compatible with our networked multi-agent MDP, we modify the environment and provide the details in E.2 in the appendix. |
| Dataset Splits | No | The paper describes experiments in a reinforcement learning environment over 'episodes' but does not specify traditional dataset splits (e.g., 80/10/10) for training, validation, and testing as might be seen in supervised learning contexts. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments (e.g., specific GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the implementation or experiments. |
| Experiment Setup | No | The paper mentions details on the model and environment in the appendix (E.1, E.2) but does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training settings in the main text. |