Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Authors: Thomy Phan, Fabian Ritz, Philipp Altmann, Maximilian Zorn, Jonas Nüßlein, Michael Kölle, Thomas Gabor, Claudia Linnhoff-Popien

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and Messy SMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in Messy SMAC.
Researcher Affiliation Academia 1University of Southern California, USA. Work done at LMU Munich 2LMU Munich, Germany.
Pseudocode Yes The complete formulation of AERIAL is given in Algorithm 1.
Open Source Code Yes Code is available at https://github.com/ thomyphan/messy_smac. Further details are in Appendix D.
Open Datasets Yes We evaluate AERIAL in Dec-Tiger as well as in a variety of original SMAC and Messy SMAC maps... (Samvelyan et al., 2019) for SMAC and (Nair et al., 2003) for Dec-Tiger.
Dataset Splits No No specific details regarding training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) are provided.
Hardware Specification Yes All training and test runs were performed in parallel on a computing cluster of fifteen x86 64 GNU/Linux (Ubuntu 18.04.5 LTS) machines with i7-8700 @ 3.2GHz CPU (8 cores) and 64 GB RAM. We did not use any GPU in our experiments.
Software Dependencies No Our experiments are based on Py MARL and the code from (Rashid et al., 2020) under the Apache License 2.0. No specific version numbers for software dependencies are provided.
Experiment Setup Yes The transformer of AERIAL has 4 heads with W c q , W c k, and W c v each having one hidden layer of datt = 64 units with Re LU activation. The subsequent MLP layers have 64 units with Re LU activation. We set the loss weight α = 0.75 for CW-QMIX and OW-QMIX. All neural networks are trained using RMSProp with a learning rate of 0.0005.