reproducibilityindex.ai

Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Authors: Thomy Phan, Fabian Ritz, Philipp Altmann, Maximilian Zorn, Jonas Nüßlein, Michael Kölle, Thomas Gabor, Claudia Linnhoff-Popien

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and Messy SMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity conﬁgurations in Messy SMAC.
Researcher Affiliation	Academia	1University of Southern California, USA. Work done at LMU Munich 2LMU Munich, Germany.
Pseudocode	Yes	The complete formulation of AERIAL is given in Algorithm 1.
Open Source Code	Yes	Code is available at https://github.com/ thomyphan/messy_smac. Further details are in Appendix D.
Open Datasets	Yes	We evaluate AERIAL in Dec-Tiger as well as in a variety of original SMAC and Messy SMAC maps... (Samvelyan et al., 2019) for SMAC and (Nair et al., 2003) for Dec-Tiger.
Dataset Splits	No	No specific details regarding training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) are provided.
Hardware Specification	Yes	All training and test runs were performed in parallel on a computing cluster of ﬁfteen x86 64 GNU/Linux (Ubuntu 18.04.5 LTS) machines with i7-8700 @ 3.2GHz CPU (8 cores) and 64 GB RAM. We did not use any GPU in our experiments.
Software Dependencies	No	Our experiments are based on Py MARL and the code from (Rashid et al., 2020) under the Apache License 2.0. No specific version numbers for software dependencies are provided.
Experiment Setup	Yes	The transformer of AERIAL has 4 heads with W c q , W c k, and W c v each having one hidden layer of datt = 64 units with Re LU activation. The subsequent MLP layers have 64 units with Re LU activation. We set the loss weight α = 0.75 for CW-QMIX and OW-QMIX. All neural networks are trained using RMSProp with a learning rate of 0.0005.