FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning

Authors: Tianhao Zhang, Yueheng Li, Chen Wang, Guangming Xie, Zongqing Lu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, in the well-known matrix game and differential game, we verify that FOP can converge to the global optimum for both discrete and continuous action spaces. We also evaluate FOP on a set of Star Craft II micromanagement tasks, and demonstrate that FOP substantially outperforms state-of-the-art decomposed value-based and actor-critic methods.
Researcher Affiliation Academia 1Peking University.
Pseudocode Yes Algorithm 1 FOP
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code or links to a code repository.
Open Datasets Yes We evaluate FOP on the challenging Star Craft Multi-Agent Challenge (SMAC) benchmark (Samvelyan et al., 2019)
Dataset Splits No The paper mentions using the StarCraft II micromanagement tasks benchmark but does not specify the train/validation/test splits used, or whether standard splits from the benchmark were used (e.g., specific percentages or sample counts).
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup No The paper describes the FOP architecture and learning objectives but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of training steps for networks, optimizer settings) or other detailed training configurations in the main text.