reproducibilityindex.ai

FACMAC: Factored Multi-Agent Centralised Policy Gradients

Authors: Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, Shimon Whiteson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent Mu Jo Co benchmark, and a challenging set of Star Craft II micromanagement tasks. Empirical results demonstrate FACMAC s superior performance over MADDPG and other baselines on all three domains.
Researcher Affiliation	Collaboration	Bei Peng University of Liverpool Tabish Rashid University of Oxford Christian A. Schroeder de Witt University of Oxford Pierre-Alexandre Kamienny Facebook AI Research Philip H. S. Torr University of Oxford Wendelin Böhmer Delft University of Technology Shimon Whiteson University of Oxford
Pseudocode	No	The paper describes algorithms and derivations but does not present a formal pseudocode block or an algorithm box.
Open Source Code	Yes	Code is available at https://github.com/oxwhirl/facmac.
Open Datasets	Yes	We evaluate FACMAC on variants of the multi-agent particle environments [23]... and the challenging SMAC benchmark [35]...
Dataset Splits	No	The paper describes using a replay buffer and mini-batches, and mentions training for a certain number of timesteps. It also uses 'test' win rate, but does not explicitly describe a train/validation/test split for the datasets themselves (e.g., in terms of percentages or counts).
Hardware Specification	Yes	All experiments were run on a single NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions 'Open AI Gym [5]' and 'PyMARL [35] framework' and 'MuJoCo', but it does not specify version numbers for any software dependencies. It mentions 'SC2.4.10.' but that's a version of StarCraft II, not a software dependency for their code.
Experiment Setup	Yes	We use Adam optimizer [17]... learning rate 5e-4... replay buffer of size 1e6... batch size 32... hidden layer size 64 for all networks... training for 2 million timesteps... The discount factor γ is 0.99... Target networks are updated using polyak averaging with smoothing coefﬁcient τ = 0.005.