Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

Authors: Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib). This appendix introduces some of the key evaluation results of MALib, and more results can be found on our project website (see issue #35). The evaluation focuses on both system and algorithm performance, including the comparison of data throughput, training eﬃciency, and algorithms convergence performance.
Researcher Affiliation	Academia	1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Institute for Artiﬁcial Intelligence, Peking University 3 Department of Computer Science, University College London corresponding authors
Pseudocode	No	The paper describes the framework components and their interactions (e.g., Coordinator dispatching tasks to Actors and Learners, Figure 2), but it does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib).
Open Datasets	Yes	As the environment for throughput comparison, we adopt the multi-agent version of Atari games (MA-Atari) from Petting Zoo (Terry et al., 2020), a collection of 2D video games with multiple agents. We compared MALib with Open Spiel(Lanctot et al., 2019) on solving Leduc Poker, a common benchmark in Poker AI. Multi-agent Particle Environments (MPE) (Lowe et al., 2017) is a typical benchmark environment for the research of MARL.
Dataset Splits	No	The paper mentions using specific environments like MA-Atari, Leduc Poker, and MPE, and describes some experimental parameters like running 2,000 simulations for Leduc Poker. However, it does not provide explicit details on dataset splits (e.g., train/test/validation percentages or counts) for any of these environments.
Hardware Specification	Yes	All the experiment results are obtained with one of the following hardware settings: System #1: a 32-core computing node with dual graphics cards; System #2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).
Software Dependencies	No	The development of MALib is based on Python, Ray (Moritz et al., 2018) and Py Torch (Paszke et al., 2019).
Experiment Setup	Yes	For each worker, we ﬁxed the number of environments as 100. The number of workers ranges from 1 to 128 to compare the upper bound and bottleneck in the parallelism performance of diﬀerent frameworks. To get a relatively accurate empirical payoﬀ, we run 2,000 simulations for each policy combination, and the maximum of population size is limited to 100. Speciﬁcally, it is constructed as a Conv Net with three convolutional layers, and two fully-connected heads for the actor and critic.