Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Multi-agent Reinforcement Learning with Stable Prefix Policy
Authors: Yue Deng, Zirui Wang, Yin Zhang
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We scale our approach to various value-based MARL methods and empirically verify our method in a cooperative MARL task, SMAC benchmarks. Experimental results demonstrate that our method achieves not only better performance but also faster convergence speed than baseline algorithms within early time steps. |
| Researcher Affiliation | Academia | College of Computer Science and Technology, Zhejiang University EMAIL |
| Pseudocode | Yes | The pseudo-code is provided in Appendix A. |
| Open Source Code | No | The paper does not provide a direct link to its source code or explicitly state that its code is open-sourced or available in supplementary materials. It mentions codebases for baselines, but not for its own implementation. |
| Open Datasets | Yes | We evaluate the performance of our method via the fully cooperative Star Craft II micro-management challenges by the mean winning rate in each scenario... SMAC: We verify our proposed stable prefix policy methods on 6 subtasks of two difficulties... The details of other SMAC tasks are shown in Appendix B. |
| Dataset Splits | No | The paper does not explicitly describe a validation dataset split (e.g., percentages or specific counts) from its experimental environment (SMAC benchmarks). While it discusses training and testing, a distinct validation set split is not detailed. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions external codebases used for baselines (e.g., "QMIX, QPLEX, and W-QMIX in this paper are from pymarl codebase [Hu et al., 2021]" or "MACPF is from the codebase [Zhang et al., 2021; Wang et al., 2023]"), but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Other hyper-parameters are in Appendix C. |