FACMAC: Factored Multi-Agent Centralised Policy Gradients

Authors: Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, Shimon Whiteson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent Mu Jo Co benchmark, and a challenging set of Star Craft II micromanagement tasks. Empirical results demonstrate FACMAC s superior performance over MADDPG and other baselines on all three domains.
Researcher Affiliation Collaboration Bei Peng University of Liverpool Tabish Rashid University of Oxford Christian A. Schroeder de Witt University of Oxford Pierre-Alexandre Kamienny Facebook AI Research Philip H. S. Torr University of Oxford Wendelin Böhmer Delft University of Technology Shimon Whiteson University of Oxford
Pseudocode No The paper describes algorithms and derivations but does not present a formal pseudocode block or an algorithm box.
Open Source Code Yes Code is available at https://github.com/oxwhirl/facmac.
Open Datasets Yes We evaluate FACMAC on variants of the multi-agent particle environments [23]... and the challenging SMAC benchmark [35]...
Dataset Splits No The paper describes using a replay buffer and mini-batches, and mentions training for a certain number of timesteps. It also uses 'test' win rate, but does not explicitly describe a train/validation/test split for the datasets themselves (e.g., in terms of percentages or counts).
Hardware Specification Yes All experiments were run on a single NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies No The paper mentions 'Open AI Gym [5]' and 'PyMARL [35] framework' and 'MuJoCo', but it does not specify version numbers for any software dependencies. It mentions 'SC2.4.10.' but that's a version of StarCraft II, not a software dependency for their code.
Experiment Setup Yes We use Adam optimizer [17]... learning rate 5e-4... replay buffer of size 1e6... batch size 32... hidden layer size 64 for all networks... training for 2 million timesteps... The discount factor γ is 0.99... Target networks are updated using polyak averaging with smoothing coefficient τ = 0.005.