Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hierarchical Multi-Agent Skill Discovery

Authors: Mingyu Yang, Yaodong Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate HMASD on sparse reward multi-agent benchmarks, and the results show that HMASD achieves significant performance improvements compared to strong MARL baselines. (Abstract) and In this section, we evaluate the effectiveness of our method. We first conduct a case study to show how HMASD effectively learns diverse useful skills and combines them to complete the task. Then, we compare HMASD with strong MARL baselines on two challenging sparse reward multi-agent benchmarks, i.e., SMAC [40] with 0-1 reward and Overcooked [41]. We further perform ablation studies for HMASD to confirm the benefits of components in our method. (Section 4, Experiments)
Researcher Affiliation	Academia	Mingyu Yang1, Yaodong Yang2 , Zhenbo Lu3 , Wengang Zhou1,3, Houqiang Li1,3 1University of Science and Technology of China, 2Institute for AI, Peking University 3Institute of Artificial Intelligence, Hefei Comprehensive National Science Center EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	Yes	A Pseudo Code of Hierarchical Multi-Agent Skill Discovery Algorithm 1: Hierarchical Multi-Agent Skill Discovery (Appendix A)
Open Source Code	No	No explicit statement about providing the open-source code for HMASD, nor a link to a repository.
Open Datasets	Yes	Then, we compare HMASD with strong MARL baselines on two challenging sparse reward multi-agent benchmarks, i.e., SMAC [40] with 0-1 reward and Overcooked [41]. (Section 4, Experiments)
Dataset Splits	No	No explicit training/validation/test dataset splits are specified. The paper mentions "eval episodes" and "eval rollout threads" but this refers to evaluation settings for reinforcement learning policies rather than dataset splits.
Hardware Specification	No	This work is supported by National Key R&D Program of China under Contract 2022ZD0119802, and National Natural Science Foundation of China under Contract 61836011. It was also supported by GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC, and the Supercomputing Center of the USTC. (Acknowledgments). This only mentions "GPU cluster" without specific models or quantities.
Software Dependencies	No	No specific software dependencies with version numbers are provided.
Experiment Setup	Yes	The hyperparameter setting can be found in Appendix E. Table 1: Common hyperparameters used for HMASD, MAT and MAPPO across all tasks. Table 2: Common hyperparameters used for HMASD, MAT and MAPPO in different tasks. Table 3: Different hyperparameters used for HMASD in different scenarios. (Appendix E) and lists specific values in those tables.