reproducibilityindex.ai

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

Authors: Jiangxing Wang, Deheng Ye, Zongqing Lu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify MACPF in various cooperative MARL tasks and demonstrate that MACPF achieves better performance or faster convergence than baselines. Our code is available at https://github.com/PKU-RL/FOP-DMAC-MACPF. 5 EXPERIMENTS In this section, we evaluate MACPF in three different scenarios.
Researcher Affiliation	Collaboration	Jiangxing Wang School of Computer Science Peking University jiangxiw@stu.pku.edu.cn Deheng Ye Tencent Inc. dericye@tencent.com Zongqing Lu School of Computer Science Peking University zongqing.lu@pku.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/PKU-RL/FOP-DMAC-MACPF.
Open Datasets	Yes	We evaluate MACPF in several tasks, including matrix game (Rashid et al., 2020), SMAC (Samvelyan et al., 2019), and MPE (Lowe et al., 2017).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment.
Hardware Specification	Yes	All models are built on Py Torch and are trained on a machine with 1 Nvidia GPU (RTX 1060) and 8 AMD CPU Cores. ... All models are built on Py Torch and are trained on a machine with 4 Nvidia GPUs (A100) and 224 Intel CPU Cores. ... For 3s5z_vs_3s6z, all models are built on Py Torch and are trained on a machine with 1 Nvidia GPU (RTX 2080 TI) and 16 Intel CPU Cores. ... All models are built on Py Torch and are trained on a machine with 1 Nvidia GPU (RTX 2080 TI) and 16 Intel CPU Cores.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number for it or other software dependencies.
Experiment Setup	Yes	In the matrix game, we use a learning rate of 3 10 4 for all algorithms. ... The batch size used in the experiment is 64 for FOP, MACPF, QMIX, and QPLEX, and 32 for MAPPO ... In Star Craft II, for MACPF, we use a learning rate of 5 10 4. The target networks are updated after every 200 training episodes. The temperature parameters α and αi are annealed from 0.5 to 0.05 over 200k time steps for all easy and hard maps and fixed as 0.001 for all super-hard maps. ... In MPE (MIT license), ... For QMIX, QPLEX, FOP, and MACPF, we use a learning rate of 5 10 4. For FOP and MACPF, α decays from 0.5 to 0.05 over 50k time steps. For QMIX and QPLEX, ϵ decays from 1 to 0.05 over 50k time steps. The batch size used in the experiment is 64.