FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning
Authors: Tianhao Zhang, Yueheng Li, Chen Wang, Guangming Xie, Zongqing Lu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, in the well-known matrix game and differential game, we verify that FOP can converge to the global optimum for both discrete and continuous action spaces. We also evaluate FOP on a set of Star Craft II micromanagement tasks, and demonstrate that FOP substantially outperforms state-of-the-art decomposed value-based and actor-critic methods. |
| Researcher Affiliation | Academia | 1Peking University. |
| Pseudocode | Yes | Algorithm 1 FOP |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code or links to a code repository. |
| Open Datasets | Yes | We evaluate FOP on the challenging Star Craft Multi-Agent Challenge (SMAC) benchmark (Samvelyan et al., 2019) |
| Dataset Splits | No | The paper mentions using the StarCraft II micromanagement tasks benchmark but does not specify the train/validation/test splits used, or whether standard splits from the benchmark were used (e.g., specific percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | No | The paper describes the FOP architecture and learning objectives but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of training steps for networks, optimizer settings) or other detailed training configurations in the main text. |