Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability
Authors: Thomy Phan, Fabian Ritz, Philipp Altmann, Maximilian Zorn, Jonas Nüßlein, Michael Kölle, Thomas Gabor, Claudia Linnhoff-Popien
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and Messy SMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in Messy SMAC. |
| Researcher Affiliation | Academia | 1University of Southern California, USA. Work done at LMU Munich 2LMU Munich, Germany. |
| Pseudocode | Yes | The complete formulation of AERIAL is given in Algorithm 1. |
| Open Source Code | Yes | Code is available at https://github.com/ thomyphan/messy_smac. Further details are in Appendix D. |
| Open Datasets | Yes | We evaluate AERIAL in Dec-Tiger as well as in a variety of original SMAC and Messy SMAC maps... (Samvelyan et al., 2019) for SMAC and (Nair et al., 2003) for Dec-Tiger. |
| Dataset Splits | No | No specific details regarding training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) are provided. |
| Hardware Specification | Yes | All training and test runs were performed in parallel on a computing cluster of fifteen x86 64 GNU/Linux (Ubuntu 18.04.5 LTS) machines with i7-8700 @ 3.2GHz CPU (8 cores) and 64 GB RAM. We did not use any GPU in our experiments. |
| Software Dependencies | No | Our experiments are based on Py MARL and the code from (Rashid et al., 2020) under the Apache License 2.0. No specific version numbers for software dependencies are provided. |
| Experiment Setup | Yes | The transformer of AERIAL has 4 heads with W c q , W c k, and W c v each having one hidden layer of datt = 64 units with Re LU activation. The subsequent MLP layers have 64 units with Re LU activation. We set the loss weight α = 0.75 for CW-QMIX and OW-QMIX. All neural networks are trained using RMSProp with a learning rate of 0.0005. |