reproducibilityindex.ai

Major-Minor Mean Field Multi-Agent Reinforcement Learning

Authors: Kai Cui, Christian Fabian, Anam Tahir, Heinz Koeppl

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate its capabilities experimentally in various scenarios. We observe a strong performance in comparison to state-of-the-art policy gradient MARL methods.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Darmstadt, Germany. Correspondence to: Kai Cui <kai.cui@tu-darmstadt.de>, Heinz Koeppl <heinz.koeppl@tu-darmstadt.de>.
Pseudocode	Yes	Algorithm 1 M3FMARL
Open Source Code	Yes	For code, see https://github.com/tudkcui/M3FC-MARL.
Open Datasets	No	The paper introduces custom-built simulation environments such as '2G', 'Formation', 'Beach bar process', 'Foraging', and 'Potential' problems in Section 4.1 and Appendix R.1. These environments are described in detail within the paper, but no external links, DOIs, repositories, or standard dataset citations are provided for public access to pre-existing datasets.
Dataset Splits	No	The paper describes the simulation environments and training process, stating 'For sake of simulation, we define the episode length T = 100 after which a new episode starts.' (Appendix R.1) and provides hyperparameters in Table 3. However, it does not specify explicit train/validation/test dataset splits as would be typical for fixed datasets.
Hardware Specification	Yes	We used no GPUs and around 300,000 CPU core hours on Intel Xeon Platinum 9242 CPUs.
Software Dependencies	Yes	Optimal transport costs are computed using POT (Flamary et al., 2021). Our M3FC MDP implementation follows the gym interface (Brockman et al., 2016), while the implementation of multi-agent RL as in the following fulfills RLlib interfaces (Liang et al., 2018). The RL implementations in our work are based on MARLlib 1.0 (Hu et al., 2023a) (MIT license), which uses RLlib 1.8 (Liang et al., 2018) (Apache-2.0 license) with hyperparameters in Table 3, and otherwise default settings.
Experiment Setup	Yes	We use two hidden layers of 256 nodes and tanh activations for the neural networks of the policies.