Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Collaborate with Unknown Agents in the Absence of Reward

Authors: Zuyuan Zhang, Hanhan Zhou, Mahdi Imani, Taeyoung Lee, Tian Lan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed solution is evaluated using a wide range of diverse unknown agents... We redesigned two multi-agent simulation environments based on multiagent-particle-envs(MPE) (Lowe et al. 2017) and SMAC (Whiteson et al. 2019) to create collaborative teaming tasks... We deploy STUN agents (and other baseline agents like obtained by multi-task learning) alongside these unknown agents and evaluate the teaming performance... We perform an ablation study...
Researcher Affiliation Academia Zuyuan Zhang1, Hanhan Zhou1, Mahdi Imani2, Taeyoung Lee1, Tian Lan1 1The George Washington University 2Northeastern University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes The pseudo-code of our proposed STUN framework can be found in Appendix A.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We redesign two multi-agent simulation environments based on multiagent-particle-envs(MPE) (Lowe et al. 2017) and SMAC (Whiteson et al. 2019) to create collaborative teaming tasks with unknown agents.
Dataset Splits No The paper mentions creating a 'training dataset D' by sampling demonstrations and utilizing 'surrogate unknown agents using sampled reward functions', but it does not specify explicit training, validation, or test splits by percentages or sample counts for its experiments in the MPE and SMAC environments.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using 'MPE and SMAC environments' and 'popular MARL algorithms like MAPPO, IPPO, COMA, and IA2C' but does not specify any version numbers for these or any other software components.
Experiment Setup No Detailed information on our settings and training configurations like hyperparameters used can be found in the Appendix.