Attention-Guided Contrastive Role Representations for Multi-agent Reinforcement Learning
Authors: Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on challenging Star Craft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. |
| Researcher Affiliation | Academia | 1 Department of Control Science and Intelligent Engineering, Nanjing University 2 School of Artificial Intelligence, Nanjing University |
| Pseudocode | Yes | Based on the implementations in Section 2, we summarize the brief procedure of ACORM based on QMIX in Algorithm 1. |
| Open Source Code | Yes | Our code is available at https://github.com/NJU-RL/ACORM. |
| Open Datasets | Yes | ACORM on top of two popular MARL algorithms, QMIX (Rashid et al., 2020) and MAPPO (Yu et al., 2022), benchmarked on challenging Star Craft multi-agent challenge (SMAC) (Samvelyan et al., 2019) and Google research football (GRF) (Kurach et al., 2020) environments. |
| Dataset Splits | No | The paper mentions evaluating 'test win rate' and using a replay buffer, but does not explicitly describe train/validation/test dataset splits (e.g., percentages or specific counts for each split). |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware specifications (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'Re LU as the activation function' and building on 'QMIX' and 'MAPPO', but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | Table 2: Hyperparameters used for ACORM based on QMIX. It lists values for 'buffer size', 'batch size', 'learning rate', 'start epsilon', 'epsilon decay steps', 'discount factor', and other training parameters. |