ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

Authors: Tonghan Wang, Heng Dong, Victor Lesser, Chongjie Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method can learn specialized, dynamic, and identifiable roles, which help our method push forward the state of the art on the Star Craft II micromanagement benchmark. Demonstrative videos are available at https: //sites.google.com/view/romarl/. We test our method on Star Craft II1 micromanagement en vironments (Vinyals et al., 2017; Samvelyan et al., 2019). Results show that our method significantly pushes forward the state of the art of MARL algorithms, by virtue of the adaptive policy sharing among agents with similar roles. 5. Experiments Our experiments aim to answer the following questions: (1) Whether the learned roles can automatically adapt in dynamic environments? (Sec. 5.1.) (2) Can our method pro mote sub-task specialization? That is, agents with similar responsibilities have similar role embedding representations, while agents with different responsibilities have role embed ding representations far from each other. (Sec. 5.1, 5.3.) (3) Can such sub-task specialization improve the perfor mance of multi-agent reinforcement learning algorithms? (Sec. 5.2.) (4) How do roles evolve during training, and how do they influence team performance? (Sec. 5.4.) (5) Can the dissimilarity model dφ learn to measure the dissimilarity between agents trajectories? (Sec. 5.4.)
Researcher Affiliation Academia Tonghan Wang 1 Heng Dong 1 Victor Lesser 2 Chongjie Zhang 1 1IIIS, Tsinghua University, Beijing, China 2University of Mas sachusetts, Amherst, USA.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Videos2 of our experiments and the code3 are available online. 3https://github.com/Tonghan Wang/ROMA
Open Datasets Yes We test our method on Star Craft II1 micromanagement en vironments (Vinyals et al., 2017; Samvelyan et al., 2019).
Dataset Splits No The paper mentions using the StarCraft II micromanagement environments but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or counts) or reference predefined splits for this environment.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We carry out a grid search over the loss coefficients λI and λD, and fix them at 10 4 and 10 2, respectively, across all the experiments. The dimensionality of latent role space is set to 3, so we did not use any dimensionality reduction tech niques when visualizing the role embedding representations. Other hyperparameters are also fixed in our experiments, which are listed in Appendix B.1.