Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data
Authors: Fuxiang Zhang, Chengxing Jia, Yi-Chen Li, Lei Yuan, Yang Yu, Zongzhang Zhang
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in cooperative MARL benchmarks, including the Star Craft multi-agent challenge, show that ODIS obtains superior performance in a wide range of tasks only with offline data from limited sources. |
| Researcher Affiliation | Collaboration | Fuxiang Zhang1, 2 , Chengxing Jia1, 2 , Yi-Chen Li1, Lei Yuan1, 2, Yang Yu1, 2, Zongzhang Zhang1 1National Key Laboratory for Novel Software Technology, Nanjing University 2Polixir Technologies |
| Pseudocode | No | The paper describes the ODIS algorithm using prose and mathematical equations but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Code available at https://github.com/LAMDA-RL/ODIS |
| Open Datasets | Yes | Following guidelines in single-agent D4RL offline RL benchmarks (Fu et al., 2020; Qin et al., 2022b), we collect data with four types of qualities called expert, medium, medium-expert, and medium-replay, respectively. |
| Dataset Splits | Yes | We train all methods with offline data only from three source tasks and evaluate them in a wide range of unseen tasks. ... The detailed properties of these task sets can be seen in Tables 2, 3, and 4, respectively. |
| Hardware Specification | Yes | The training process of ODIS with an NVIDIA Ge Force RTX 2080Ti GPU and a 32-core CPU costs 12-14 hours typically. |
| Software Dependencies | No | The paper mentions implementing ODIS with the 'Py MARL framework' but does not specify a version number for this framework or any other software dependencies. |
| Experiment Setup | Yes | Table 6: Hyper-parameters of ODIS. lists: hidden layer dimension 64, attention dimension 64, coordination skill number 3 (marine-easy); 5 (marine-hard); 4 (stalker-zealot), steps of coordination skill discovery 15000, optimizer Adam, learning rate 0.0005. |