Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Attention-Guided Contrastive Role Representations for Multi-agent Reinforcement Learning
Authors: Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on challenging Star Craft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. |
| Researcher Affiliation | Academia | 1 Department of Control Science and Intelligent Engineering, Nanjing University 2 School of Artificial Intelligence, Nanjing University |
| Pseudocode | Yes | Based on the implementations in Section 2, we summarize the brief procedure of ACORM based on QMIX in Algorithm 1. |
| Open Source Code | Yes | Our code is available at https://github.com/NJU-RL/ACORM. |
| Open Datasets | Yes | ACORM on top of two popular MARL algorithms, QMIX (Rashid et al., 2020) and MAPPO (Yu et al., 2022), benchmarked on challenging Star Craft multi-agent challenge (SMAC) (Samvelyan et al., 2019) and Google research football (GRF) (Kurach et al., 2020) environments. |
| Dataset Splits | No | The paper mentions evaluating 'test win rate' and using a replay buffer, but does not explicitly describe train/validation/test dataset splits (e.g., percentages or specific counts for each split). |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware specifications (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'Re LU as the activation function' and building on 'QMIX' and 'MAPPO', but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | Table 2: Hyperparameters used for ACORM based on QMIX. It lists values for 'buffer size', 'batch size', 'learning rate', 'start epsilon', 'epsilon decay steps', 'discount factor', and other training parameters. |