Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data
Authors: Fuxiang Zhang, Chengxing Jia, Yi-Chen Li, Lei Yuan, Yang Yu, Zongzhang Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in cooperative MARL benchmarks, including the Star Craft multi-agent challenge, show that ODIS obtains superior performance in a wide range of tasks only with offline data from limited sources. |
| Researcher Affiliation | Collaboration | Fuxiang Zhang1, 2 , Chengxing Jia1, 2 , Yi-Chen Li1, Lei Yuan1, 2, Yang Yu1, 2, Zongzhang Zhang1 1National Key Laboratory for Novel Software Technology, Nanjing University 2Polixir Technologies |
| Pseudocode | No | The paper describes the ODIS algorithm using prose and mathematical equations but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Code available at https://github.com/LAMDA-RL/ODIS |
| Open Datasets | Yes | Following guidelines in single-agent D4RL offline RL benchmarks (Fu et al., 2020; Qin et al., 2022b), we collect data with four types of qualities called expert, medium, medium-expert, and medium-replay, respectively. |
| Dataset Splits | Yes | We train all methods with offline data only from three source tasks and evaluate them in a wide range of unseen tasks. ... The detailed properties of these task sets can be seen in Tables 2, 3, and 4, respectively. |
| Hardware Specification | Yes | The training process of ODIS with an NVIDIA Ge Force RTX 2080Ti GPU and a 32-core CPU costs 12-14 hours typically. |
| Software Dependencies | No | The paper mentions implementing ODIS with the 'Py MARL framework' but does not specify a version number for this framework or any other software dependencies. |
| Experiment Setup | Yes | Table 6: Hyper-parameters of ODIS. lists: hidden layer dimension 64, attention dimension 64, coordination skill number 3 (marine-easy); 5 (marine-hard); 4 (stalker-zealot), steps of coordination skill discovery 15000, optimizer Adam, learning rate 0.0005. |