Improving Multi-agent Reinforcement Learning with Stable Prefix Policy
Authors: Yue Deng, Zirui Wang, Yin Zhang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We scale our approach to various value-based MARL methods and empirically verify our method in a cooperative MARL task, SMAC benchmarks. Experimental results demonstrate that our method achieves not only better performance but also faster convergence speed than baseline algorithms within early time steps. |
| Researcher Affiliation | Academia | College of Computer Science and Technology, Zhejiang University {devindeng, ziseoiwong, zhangyin98}@zju.edu.cn |
| Pseudocode | Yes | The pseudo-code is provided in Appendix A. |
| Open Source Code | No | The paper does not provide a direct link to its source code or explicitly state that its code is open-sourced or available in supplementary materials. It mentions codebases for baselines, but not for its own implementation. |
| Open Datasets | Yes | We evaluate the performance of our method via the fully cooperative Star Craft II micro-management challenges by the mean winning rate in each scenario... SMAC: We verify our proposed stable prefix policy methods on 6 subtasks of two difficulties... The details of other SMAC tasks are shown in Appendix B. |
| Dataset Splits | No | The paper does not explicitly describe a validation dataset split (e.g., percentages or specific counts) from its experimental environment (SMAC benchmarks). While it discusses training and testing, a distinct validation set split is not detailed. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions external codebases used for baselines (e.g., "QMIX, QPLEX, and W-QMIX in this paper are from pymarl codebase [Hu et al., 2021]" or "MACPF is from the codebase [Zhang et al., 2021; Wang et al., 2023]"), but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Other hyper-parameters are in Appendix C. |