Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

Authors: Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Basar

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first evaluate our algorithms with linear function approximation... It is shown in Figure 2 that the proposed algorithms successfully converge even with such nonlinear function approximators...
Researcher Affiliation Collaboration 1Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois at Urbana Champaign, USA 2Department of Operations Research and Financial Engineering, Princeton University, USA 3Department of Electrical Engineering and Computer Science and Statistics, Northwestern University, USA 4Tencent AI Lab, China.
Pseudocode Yes We refer to the steps (3.3)-(3.5) as Algorithm 1, whose pseudocode is provided in A in the appendix.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository for the methodology described.
Open Datasets Yes We consider the MARL task of Cooperative Navigation from Lowe et al. (2017). To be compatible with our networked multi-agent MDP, we modify the environment and provide the details in E.2 in the appendix.
Dataset Splits No The paper describes experiments in a reinforcement learning environment over 'episodes' but does not specify traditional dataset splits (e.g., 80/10/10) for training, validation, and testing as might be seen in supervised learning contexts.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the implementation or experiments.
Experiment Setup No The paper mentions details on the model and environment in the appendix (E.1, E.2) but does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training settings in the main text.