RODE: Learning Roles to Decompose Multi-Agent Tasks
Authors: Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | our method (1) outperforms the current state-of-the-art MARL algorithms on 9 of the 14 scenarios that comprise the challenging Star Craft II micromanagement benchmark and (2) achieves rapid transfer to new environments with three times the number of agents. |
| Researcher Affiliation | Academia | Institute for Interdisciplinary Information Sciences, Tsinghua University Univeristy of Oxford tonghanwang1996@gmail.com {tarun.gupta, anuj.mahajan, bei.peng}@cs.ox.ac.uk Shimon Whiteson Univeristy of Oxford shimon.whiteson@cs.ox.ac.uk Chongjie Zhang* Tsinghua University chongjie@tsinghua.edu.cn |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1) but no pseudocode or algorithm blocks. |
| Open Source Code | No | The abstract states: "Demonstrative videos can be viewed at https://sites.google.com/view/rode-marl.", which links to videos, not source code. The paper also mentions using "codes provided by their authors" for baselines, but does not state that RODE's code is open-source or available. |
| Open Datasets | Yes | We choose the Star Craft II micromanagement (SMAC) benchmark (Samvelyan et al., 2019) as the testbed for its rich environments and high complexity of control. |
| Dataset Splits | No | The paper mentions running experiments with random seeds and showing median performance and percentiles, and using a replay buffer for sampling episodes. However, it does not specify a train/validation/test dataset split with percentages or sample counts for model evaluation. |
| Hardware Specification | Yes | Experiments are carried out on NVIDIA GTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions the use of "RMSprop" optimizer and "QMIX"-style mixing networks, but does not provide specific version numbers for any software, libraries, or programming languages used (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | For all experiments, the optimization is conducted using RMSprop with a learning rate of 5 10 4, α of 0.99, and with no momentum or weight decay. For exploration, we use ϵ-greedy with ϵ annealed linearly from 1.0 to 0.05 over 50K time steps and kept constant for the rest of the training. For three hard exploration maps 3s5z vs 3s6z, 6h vs 8z, and 27m vs 30m we extend the epsilon annealing time to 500K, for both RODE and all the baselines and ablations. Batches of 32 episodes are sampled from the replay buffer, and the role selector and role policies are trained end-to-end on fully unrolled episodes. |