Attention-Guided Contrastive Role Representations for Multi-agent Reinforcement Learning

Authors: Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on challenging Star Craft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches.
Researcher Affiliation Academia 1 Department of Control Science and Intelligent Engineering, Nanjing University 2 School of Artificial Intelligence, Nanjing University
Pseudocode Yes Based on the implementations in Section 2, we summarize the brief procedure of ACORM based on QMIX in Algorithm 1.
Open Source Code Yes Our code is available at https://github.com/NJU-RL/ACORM.
Open Datasets Yes ACORM on top of two popular MARL algorithms, QMIX (Rashid et al., 2020) and MAPPO (Yu et al., 2022), benchmarked on challenging Star Craft multi-agent challenge (SMAC) (Samvelyan et al., 2019) and Google research football (GRF) (Kurach et al., 2020) environments.
Dataset Splits No The paper mentions evaluating 'test win rate' and using a replay buffer, but does not explicitly describe train/validation/test dataset splits (e.g., percentages or specific counts for each split).
Hardware Specification No The paper does not explicitly mention any specific hardware specifications (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'Re LU as the activation function' and building on 'QMIX' and 'MAPPO', but does not provide specific version numbers for any software components or libraries.
Experiment Setup Yes Table 2: Hyperparameters used for ACORM based on QMIX. It lists values for 'buffer size', 'batch size', 'learning rate', 'start epsilon', 'epsilon decay steps', 'discount factor', and other training parameters.