MADiff: Offline Multi-agent Learning with Diffusion Models
Authors: Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that MADIFF outperforms baseline algorithms across various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiao Tong University, 2 Byte Dance, 3 Stanford University {zhengbangzhu, minghuanliu, maoliyuan, yyu, wnzhang}@sjtu.edu.cn, bingykang@gmail.com, {minkai, ermon}@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Multi-Agent Planning with MADIFF; Algorithm 2 Multi-Agent Trajectory Prediction with MADIFF |
| Open Source Code | Yes | We provide code, anonymous data download link, and necessary instructions in supplementary materials. |
| Open Datasets | Yes | NBA dataset: the dataset consists of various basketball players recorded trajectories from 631 games in the 2015-16 season. Following Alcorn and Nguyen [2021], we split 569/30/32 training/validation/test games, with downsampling from 25 Hz to 5Hz. We use the offline datasets constructed by Pan et al. [2022], including four datasets... we use the off-the-grid offline dataset [Formanek et al., 2023]... |
| Dataset Splits | Yes | NBA dataset: the dataset consists of various basketball players recorded trajectories from 631 games in the 2015-16 season. Following Alcorn and Nguyen [2021], we split 569/30/32 training/validation/test games, with downsampling from 25 Hz to 5Hz. |
| Hardware Specification | Yes | On a server equipped with an AMD Ryzen 9 5900X (12 cores) CPU and an RTX 3090 GPU, we trained the MADIFF-C model on the Expert dataset from the MPE Spread task, achieving convergence in approximately one hour. The results are obtained on a server with an AMD Ryzen 9 5900X (12 cores) CPU and an RTX 3090 GPU, and are averaged over 1000 trials. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages or libraries. |
| Experiment Setup | Yes | We list the key hyperparameters of MADIFF we used in Table 4, Table 5, and Table 6. [These tables include: Learning rate, Guidance scale ω, Planning horizon H, History horizon, Batch size, Diffusion steps K, Reward discount γ, Optimizer] |