Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning
Authors: Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, Zhen Xiao7219-7226
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several challenging tasks (i.e., packet routing, wificonfiguration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches. |
| Researcher Affiliation | Collaboration | 1Peking University, 2Noah s Ark Lab, Huawei 3Tianjin University, 4University College London |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | For continuous actions, we adopt the packet routing tasks proposed by ATT-MADDPG (Mao et al. 2019) to ensure fair comparison. For discrete actions, we test wificonfiguration and Google football environments. ... We refer the readers to (Kurach et al. 2019) for the details [of Google Football]. |
| Dataset Splits | No | The paper describes the experimental environments and scenarios (e.g., packet routing, wificonfiguration, Google Football 2-vs-2, 3-vs-2) but does not provide specific details on how the data within these environments are split into training, validation, and test sets (e.g., percentages, sample counts, or explicit standard splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific solver versions). |
| Experiment Setup | Yes | In our experiments, the weights of CD-loss are α = 0.1, α = 0.1 and α = 0.2 for packet routing, wificonfiguration and Google football, respectively. In order to accelerate learning in Google football, we reduce the action set to {topright-move, bottom-right-move, high-pass, shot} during exploration, and we give a reward 1.0, 0.7 and 0.3 to the last three actions respectively if the agents score a goal. |