Coordination Between Individual Agents in Multi-Agent Reinforcement Learning
Authors: Yang Zhang, Qingyu Yang, Dou An, Chengwei Zhang11387-11394
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that the proposed method outperforms the state-of-the-art MARL methods and can measure the correlation between individual agents accurately. |
| Researcher Affiliation | Academia | Yang Zhang,1 Qingyu Yang,1,2,3* Dou An,1,2,3 Chengwei Zhang4 1Faculty of Electronics and Information, Xi an Jiaotong University, Xi an, China 2State Key Laboratory for Manufacturing System Engineering (SKLMSE), Xi an Jiaotong University, Xi an, China 3Ministry of Education (MOE) Key Laboratory for Intelligent Networks and Network Security, Xi an Jiaotong University, Xi an, China 4Dalian Maritime University, Dalian, China |
| Pseudocode | Yes | Algorithm 1 ALC-MADDPG algorithm |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | The experiments were performed in four mixed competitivecooperative environments, which can be found in (Lowe et al. 2017; Mordatch and Abbeel 2018). |
| Dataset Splits | No | The paper refers to 'training episodes' and 'mean received reward of agents in an episode' but does not explicitly specify training, validation, or test dataset splits (e.g., percentages or counts) as commonly defined in supervised learning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used to run the experiments. |
| Software Dependencies | No | The paper mentions various MARL methods (e.g., MADDPG, VDN, QMIX) and reinforcement learning algorithms (DQN, DDPG), but it does not specify any software libraries or dependencies with version numbers used for implementation. |
| Experiment Setup | Yes | In each actor and critic network, there were two fully-connected layers, and both of them consisted of 64 neurons. The parameters τ, γ, learning rate, and the size of replay memory were set as 0.1, 0.95, 0.01 and 10, 0000, respectively. |