Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning to Collaborate with Unknown Agents in the Absence of Reward
Authors: Zuyuan Zhang, Hanhan Zhou, Mahdi Imani, Taeyoung Lee, Tian Lan
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed solution is evaluated using a wide range of diverse unknown agents... We redesigned two multi-agent simulation environments based on multiagent-particle-envs(MPE) (Lowe et al. 2017) and SMAC (Whiteson et al. 2019) to create collaborative teaming tasks... We deploy STUN agents (and other baseline agents like obtained by multi-task learning) alongside these unknown agents and evaluate the teaming performance... We perform an ablation study... |
| Researcher Affiliation | Academia | Zuyuan Zhang1, Hanhan Zhou1, Mahdi Imani2, Taeyoung Lee1, Tian Lan1 1The George Washington University 2Northeastern University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The pseudo-code of our proposed STUN framework can be found in Appendix A. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We redesign two multi-agent simulation environments based on multiagent-particle-envs(MPE) (Lowe et al. 2017) and SMAC (Whiteson et al. 2019) to create collaborative teaming tasks with unknown agents. |
| Dataset Splits | No | The paper mentions creating a 'training dataset D' by sampling demonstrations and utilizing 'surrogate unknown agents using sampled reward functions', but it does not specify explicit training, validation, or test splits by percentages or sample counts for its experiments in the MPE and SMAC environments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'MPE and SMAC environments' and 'popular MARL algorithms like MAPPO, IPPO, COMA, and IA2C' but does not specify any version numbers for these or any other software components. |
| Experiment Setup | No | Detailed information on our settings and training configurations like hyperparameters used can be found in the Appendix. |