reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards a Standardised Performance Evaluation Protocol for Cooperative MARL

Authors: Rihab Gorsane, Omayma Mahjoub, Ruan John de Kock, Roland Dubb, Siddarth Singh, Arnu Pretorius

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By conducting a detailed meta-analysis of prior work, spanning 75 papers accepted for publication from 2016 to 2022, we bring to light worrying trends that put into question the true rate of progress. We further consider these trends in a wider context and take inspiration from single-agent RL literature on similar issues with recommendations that remain applicable to MARL. Combining these recommendations, with novel insights from our analysis, we propose a standardised performance evaluation protocol for cooperative MARL. Finally, we release our meta-analysis data publicly on our project website for future research on evaluation 3 accompanied by our open-source evaluation tools repository4.
Researcher Affiliation	Collaboration	Rihab Gorsane1 Omayma Mahjoub12 Ruan de Kock1 Roland Dubb13 Siddarth Singh1 Arnu Pretorius1 1Insta Deep 2National School of Computer Science, Tunisia 3University of Cape Town, South Africa
Pseudocode	No	The paper presents its proposed protocol in a bulleted list format within a blue box, but it is a set of recommendations and guidelines, not a formal pseudocode or algorithm block.
Open Source Code	Yes	Finally, we release our meta-analysis data publicly on our project website for future research on evaluation 3 accompanied by our open-source evaluation tools repository4. 4https://github.com/instadeepai/marl-eval
Open Datasets	Yes	Finally, we release our meta-analysis data publicly on our project website for future research on evaluation 3 accompanied by our open-source evaluation tools repository4. In total, we collected data from 75 cooperative MARL papers accepted for publication... We believe this dataset is the first of its kind and we have made it publicly available for further analysis.
Dataset Splits	No	This paper conducts a meta-analysis of existing papers and does not involve training machine learning models with specific dataset splits for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used to conduct its meta-analysis.
Software Dependencies	No	The paper provides a link to an open-source evaluation tools repository but does not list specific software dependencies with version numbers used for its meta-analysis.
Experiment Setup	No	The paper describes the parameters for the recommended evaluation protocol for MARL, but not the specific experimental setup (e.g., hyperparameters, training configurations) for its own meta-analysis.