Major-Minor Mean Field Multi-Agent Reinforcement Learning
Authors: Kai Cui, Christian Fabian, Anam Tahir, Heinz Koeppl
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate its capabilities experimentally in various scenarios. We observe a strong performance in comparison to state-of-the-art policy gradient MARL methods. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Darmstadt, Germany. Correspondence to: Kai Cui <kai.cui@tu-darmstadt.de>, Heinz Koeppl <heinz.koeppl@tu-darmstadt.de>. |
| Pseudocode | Yes | Algorithm 1 M3FMARL |
| Open Source Code | Yes | For code, see https://github.com/tudkcui/M3FC-MARL. |
| Open Datasets | No | The paper introduces custom-built simulation environments such as '2G', 'Formation', 'Beach bar process', 'Foraging', and 'Potential' problems in Section 4.1 and Appendix R.1. These environments are described in detail within the paper, but no external links, DOIs, repositories, or standard dataset citations are provided for public access to pre-existing datasets. |
| Dataset Splits | No | The paper describes the simulation environments and training process, stating 'For sake of simulation, we define the episode length T = 100 after which a new episode starts.' (Appendix R.1) and provides hyperparameters in Table 3. However, it does not specify explicit train/validation/test dataset splits as would be typical for fixed datasets. |
| Hardware Specification | Yes | We used no GPUs and around 300,000 CPU core hours on Intel Xeon Platinum 9242 CPUs. |
| Software Dependencies | Yes | Optimal transport costs are computed using POT (Flamary et al., 2021). Our M3FC MDP implementation follows the gym interface (Brockman et al., 2016), while the implementation of multi-agent RL as in the following fulfills RLlib interfaces (Liang et al., 2018). The RL implementations in our work are based on MARLlib 1.0 (Hu et al., 2023a) (MIT license), which uses RLlib 1.8 (Liang et al., 2018) (Apache-2.0 license) with hyperparameters in Table 3, and otherwise default settings. |
| Experiment Setup | Yes | We use two hidden layers of 256 nodes and tanh activations for the neural networks of the policies. |