Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Collaborate with Unknown Agents in the Absence of Reward

Authors: Zuyuan Zhang, Hanhan Zhou, Mahdi Imani, Taeyoung Lee, Tian Lan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed solution is evaluated using a wide range of diverse unknown agents... We redesigned two multi-agent simulation environments based on multiagent-particle-envs(MPE) (Lowe et al. 2017) and SMAC (Whiteson et al. 2019) to create collaborative teaming tasks... We deploy STUN agents (and other baseline agents like obtained by multi-task learning) alongside these unknown agents and evaluate the teaming performance... We perform an ablation study...
Researcher Affiliation	Academia	Zuyuan Zhang1, Hanhan Zhou1, Mahdi Imani2, Taeyoung Lee1, Tian Lan1 1The George Washington University 2Northeastern University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The pseudo-code of our proposed STUN framework can be found in Appendix A.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We redesign two multi-agent simulation environments based on multiagent-particle-envs(MPE) (Lowe et al. 2017) and SMAC (Whiteson et al. 2019) to create collaborative teaming tasks with unknown agents.
Dataset Splits	No	The paper mentions creating a 'training dataset D' by sampling demonstrations and utilizing 'surrogate unknown agents using sampled reward functions', but it does not specify explicit training, validation, or test splits by percentages or sample counts for its experiments in the MPE and SMAC environments.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using 'MPE and SMAC environments' and 'popular MARL algorithms like MAPPO, IPPO, COMA, and IA2C' but does not specify any version numbers for these or any other software components.
Experiment Setup	No	Detailed information on our settings and training configurations like hyperparameters used can be found in the Appendix.