More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization
Authors: Jiangxing Wang, Deheng Ye, Zongqing Lu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify MACPF in various cooperative MARL tasks and demonstrate that MACPF achieves better performance or faster convergence than baselines. Our code is available at https://github.com/PKU-RL/FOP-DMAC-MACPF. 5 EXPERIMENTS In this section, we evaluate MACPF in three different scenarios. |
| Researcher Affiliation | Collaboration | Jiangxing Wang School of Computer Science Peking University jiangxiw@stu.pku.edu.cn Deheng Ye Tencent Inc. dericye@tencent.com Zongqing Lu School of Computer Science Peking University zongqing.lu@pku.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/PKU-RL/FOP-DMAC-MACPF. |
| Open Datasets | Yes | We evaluate MACPF in several tasks, including matrix game (Rashid et al., 2020), SMAC (Samvelyan et al., 2019), and MPE (Lowe et al., 2017). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits needed to reproduce the experiment. |
| Hardware Specification | Yes | All models are built on Py Torch and are trained on a machine with 1 Nvidia GPU (RTX 1060) and 8 AMD CPU Cores. ... All models are built on Py Torch and are trained on a machine with 4 Nvidia GPUs (A100) and 224 Intel CPU Cores. ... For 3s5z_vs_3s6z, all models are built on Py Torch and are trained on a machine with 1 Nvidia GPU (RTX 2080 TI) and 16 Intel CPU Cores. ... All models are built on Py Torch and are trained on a machine with 1 Nvidia GPU (RTX 2080 TI) and 16 Intel CPU Cores. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for it or other software dependencies. |
| Experiment Setup | Yes | In the matrix game, we use a learning rate of 3 10 4 for all algorithms. ... The batch size used in the experiment is 64 for FOP, MACPF, QMIX, and QPLEX, and 32 for MAPPO ... In Star Craft II, for MACPF, we use a learning rate of 5 10 4. The target networks are updated after every 200 training episodes. The temperature parameters α and αi are annealed from 0.5 to 0.05 over 200k time steps for all easy and hard maps and fixed as 0.001 for all super-hard maps. ... In MPE (MIT license), ... For QMIX, QPLEX, FOP, and MACPF, we use a learning rate of 5 10 4. For FOP and MACPF, α decays from 0.5 to 0.05 over 50k time steps. For QMIX and QPLEX, ϵ decays from 1 to 0.05 over 50k time steps. The batch size used in the experiment is 64. |