Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning
Authors: Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). This appendix introduces some of the key evaluation results of MALib, and more results can be found on our project website (see issue #35). The evaluation focuses on both system and algorithm performance, including the comparison of data throughput, training efficiency, and algorithms convergence performance. |
| Researcher Affiliation | Academia | 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Institute for Artificial Intelligence, Peking University 3 Department of Computer Science, University College London corresponding authors |
| Pseudocode | No | The paper describes the framework components and their interactions (e.g., Coordinator dispatching tasks to Actors and Learners, Figure 2), but it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). |
| Open Datasets | Yes | As the environment for throughput comparison, we adopt the multi-agent version of Atari games (MA-Atari) from Petting Zoo (Terry et al., 2020), a collection of 2D video games with multiple agents. We compared MALib with Open Spiel(Lanctot et al., 2019) on solving Leduc Poker, a common benchmark in Poker AI. Multi-agent Particle Environments (MPE) (Lowe et al., 2017) is a typical benchmark environment for the research of MARL. |
| Dataset Splits | No | The paper mentions using specific environments like MA-Atari, Leduc Poker, and MPE, and describes some experimental parameters like running 2,000 simulations for Leduc Poker. However, it does not provide explicit details on dataset splits (e.g., train/test/validation percentages or counts) for any of these environments. |
| Hardware Specification | Yes | All the experiment results are obtained with one of the following hardware settings: System #1: a 32-core computing node with dual graphics cards; System #2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090). |
| Software Dependencies | No | The development of MALib is based on Python, Ray (Moritz et al., 2018) and Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | For each worker, we fixed the number of environments as 100. The number of workers ranges from 1 to 128 to compare the upper bound and bottleneck in the parallelism performance of different frameworks. To get a relatively accurate empirical payoff, we run 2,000 simulations for each policy combination, and the maximum of population size is limited to 100. Specifically, it is constructed as a Conv Net with three convolutional layers, and two fully-connected heads for the actor and critic. |