Coordination Between Individual Agents in Multi-Agent Reinforcement Learning

Authors: Yang Zhang, Qingyu Yang, Dou An, Chengwei Zhang11387-11394

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that the proposed method outperforms the state-of-the-art MARL methods and can measure the correlation between individual agents accurately.
Researcher Affiliation Academia Yang Zhang,1 Qingyu Yang,1,2,3* Dou An,1,2,3 Chengwei Zhang4 1Faculty of Electronics and Information, Xi an Jiaotong University, Xi an, China 2State Key Laboratory for Manufacturing System Engineering (SKLMSE), Xi an Jiaotong University, Xi an, China 3Ministry of Education (MOE) Key Laboratory for Intelligent Networks and Network Security, Xi an Jiaotong University, Xi an, China 4Dalian Maritime University, Dalian, China
Pseudocode Yes Algorithm 1 ALC-MADDPG algorithm
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets Yes The experiments were performed in four mixed competitivecooperative environments, which can be found in (Lowe et al. 2017; Mordatch and Abbeel 2018).
Dataset Splits No The paper refers to 'training episodes' and 'mean received reward of agents in an episode' but does not explicitly specify training, validation, or test dataset splits (e.g., percentages or counts) as commonly defined in supervised learning.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used to run the experiments.
Software Dependencies No The paper mentions various MARL methods (e.g., MADDPG, VDN, QMIX) and reinforcement learning algorithms (DQN, DDPG), but it does not specify any software libraries or dependencies with version numbers used for implementation.
Experiment Setup Yes In each actor and critic network, there were two fully-connected layers, and both of them consisted of 64 neurons. The parameters τ, γ, learning rate, and the size of replay memory were set as 0.1, 0.95, 0.01 and 10, 0000, respectively.