Improving Multi-agent Reinforcement Learning with Stable Prefix Policy

Authors: Yue Deng, Zirui Wang, Yin Zhang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We scale our approach to various value-based MARL methods and empirically verify our method in a cooperative MARL task, SMAC benchmarks. Experimental results demonstrate that our method achieves not only better performance but also faster convergence speed than baseline algorithms within early time steps.
Researcher Affiliation Academia College of Computer Science and Technology, Zhejiang University {devindeng, ziseoiwong, zhangyin98}@zju.edu.cn
Pseudocode Yes The pseudo-code is provided in Appendix A.
Open Source Code No The paper does not provide a direct link to its source code or explicitly state that its code is open-sourced or available in supplementary materials. It mentions codebases for baselines, but not for its own implementation.
Open Datasets Yes We evaluate the performance of our method via the fully cooperative Star Craft II micro-management challenges by the mean winning rate in each scenario... SMAC: We verify our proposed stable prefix policy methods on 6 subtasks of two difficulties... The details of other SMAC tasks are shown in Appendix B.
Dataset Splits No The paper does not explicitly describe a validation dataset split (e.g., percentages or specific counts) from its experimental environment (SMAC benchmarks). While it discusses training and testing, a distinct validation set split is not detailed.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions external codebases used for baselines (e.g., "QMIX, QPLEX, and W-QMIX in this paper are from pymarl codebase [Hu et al., 2021]" or "MACPF is from the codebase [Zhang et al., 2021; Wang et al., 2023]"), but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Other hyper-parameters are in Appendix C.