Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
Authors: Zeyang Liu, Xinrui Yang, Shiguang Sun, Long Qian, Lipeng Wan, Xingyu Chen, Xuguang Lan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results demonstrate that this framework can improve the answers for multi-agent decision-making problems by showing superior performance on the training and unseen tasks of the Star Craft Multi-Agent Challenge benchmark. In particular, it can generate consistent interaction sequences and explainable reward functions at interaction states, opening the path for training generative models of the future. |
| Researcher Affiliation | Academia | Zeyang Liu zeyang.liu@stu.xjtu.edu.cn Xinrui Yang xinrui.yang@stu.xjtu.edu.cn Shiguang Sun ssg2019@stu.xjtu.edu.cn Long Qian qianlongym@stu.xjtu.edu.cn Lipeng Wan wanlipeng77@xjtu.edu.cn Xingyu Chen chenxingyu_1990@xjtu.edu.cn Xuguang Lan xglan@mail.xjtu.edu.cn National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Application Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, Xi an, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | We choose not to release the data and code at present. We would like to have the opportunity to further engage with the research community and to ensure that any future such releases are respectful, safe, and responsible. |
| Open Datasets | Yes | The training maps include 3s5z, 1c3s5z, 10m_vs_11m, 2c_vs_64zg, 3s_vs_5z, 5m_vs_6m, 6h_vs_8z, 3s5z_vs_3s6z, corridor, MMM2 in Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019). We use EMC (Zheng et al., 2021) and IIE (Liu et al., 2024) to collect 50000 trajectories for each map and save these data as NPY files. |
| Dataset Splits | No | The paper describes using 'training maps' and 'unseen testing maps' but does not explicitly mention a separate 'validation split' or 'validation set' with specific proportions or counts for hyperparameter tuning. |
| Hardware Specification | Yes | In this paper, all experiments are implemented with Pytorch and executed on eight NVIDIA A800 GPUs. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not provide a specific version number for it or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | We train our image tokenizer for 100k steps using the Adam W optimizer, with cosine decay, using the hyperparameters in Table 8. The batch size is 32, and the learning rate is 1e-4. ... We build our dynamics model implementation based on Decision Transformer12 (Chen et al., 2021). The complete list of hyperparameters can be found in Table 9. |