Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library

Authors: Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments to demonstrate the efficiency of MARLlib compared to EPy MARL and the on-policy baseline (official MAPPO (Yu et al., 2022)). The experiments were performed on a local server with an NVIDIA RTX A6000 GPU and an AMD Ryzen Threadripper PRO 5945WX 12-Cores CPU. The testing scenario is MMM2 from SMAC (Samvelyan et al., 2019), and the testing algorithm is MAPPO. The total consumed timesteps are 10^6. From Table. 2, it is evident that MARLlib is significantly more efficient than the other frameworks in terms of clock time... In this section, we conducted a comprehensive evaluation of 17 algorithms on 23 tasks from five widely-used MARL testing environments, namely SMAC (Samvelyan et al., 2019), MPE (Lowe et al., 2017), GRF (Kurach et al., 2020), MAMu Jo Co (Peng et al., 2021), and MAgent (Zheng et al., 2018). We selected these environments for their popularity in MARL research and their diversity in task modes, observation shapes, additional information, action spaces, sparse or dense rewards, and homogeneous or heterogeneous agent types. The evaluation involved running each algorithm on each task with four different random seeds, resulting in over one thousand experiments in total. We measured the mean return achieved by each algorithm across these experiments. The results of our experiments are presented in Table 4 and Figure 6.
Researcher Affiliation Collaboration Siyi Hu1 EMAIL Yifan Zhong2 EMAIL Minquan Gao2 EMAIL Weixun Wang3 EMAIL Hao Dong2 EMAIL Xiaodan Liang4,6 EMAIL Zhihui Li5 EMAIL Xiaojun Chang1,4 EMAIL Yaodong Yang2 EMAIL 1 Re LER, AAII, University of Technology Sydney 2 Institute for Artificial Intelligence, Peking University 3 Net Ease Fuxi AI Lab 4 MBZUAI 5 Shandong Artificial Intelligence Institute, Qilu University of Technology 6 School of Intelligent Systems Engineering, Sun Yat-sen University
Pseudocode No The paper describes the design and implementation of MARLlib and its components but does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes The MARLlib library s source code is publicly accessible on Git Hub: https://github.com/Replicable-MARL/MARLlib.
Open Datasets Yes We conducted a comprehensive evaluation of 17 algorithms on 23 tasks from five widely-used MARL testing environments, namely SMAC (Samvelyan et al., 2019), MPE (Lowe et al., 2017), GRF (Kurach et al., 2020), MAMu Jo Co (Peng et al., 2021), and MAgent (Zheng et al., 2018).
Dataset Splits No The paper uses simulation environments where data is generated through interaction, and specifies training duration in terms of timesteps (e.g., 'total consumed timesteps are 10^6'). It does not provide specific train/test/validation splits for a pre-collected, fixed dataset.
Hardware Specification Yes We conducted experiments to demonstrate the efficiency of MARLlib compared to EPy MARL and the on-policy baseline (official MAPPO (Yu et al., 2022)). The experiments were performed on a local server with an NVIDIA RTX A6000 GPU and an AMD Ryzen Threadripper PRO 5945WX 12-Cores CPU.
Software Dependencies Yes We have tested the installation on Python 3.8 with both Ubuntu 18.04 and Ubuntu 20.04... # recommend always keeping the gym version at 0.21.0. $ pip install gym==0.21.0
Experiment Setup Yes mappo.fit(env, model, stop={'timesteps_total': 1000000}, checkpoint_freq=100, share_policy='group')... The total consumed timesteps are 10^6. From Table. 2, it is evident that MARLlib is significantly more efficient than the other frameworks in terms of clock time... The results obtained by EPy MARL involved 40 million steps for on-policy algorithms and four million steps for off-policy algorithms. In contrast, MARLlib consumed only half of these steps for training as we found it sufficient for convergence.