FACMAC: Factored Multi-Agent Centralised Policy Gradients
Authors: Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, Shimon Whiteson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent Mu Jo Co benchmark, and a challenging set of Star Craft II micromanagement tasks. Empirical results demonstrate FACMAC s superior performance over MADDPG and other baselines on all three domains. |
| Researcher Affiliation | Collaboration | Bei Peng University of Liverpool Tabish Rashid University of Oxford Christian A. Schroeder de Witt University of Oxford Pierre-Alexandre Kamienny Facebook AI Research Philip H. S. Torr University of Oxford Wendelin Böhmer Delft University of Technology Shimon Whiteson University of Oxford |
| Pseudocode | No | The paper describes algorithms and derivations but does not present a formal pseudocode block or an algorithm box. |
| Open Source Code | Yes | Code is available at https://github.com/oxwhirl/facmac. |
| Open Datasets | Yes | We evaluate FACMAC on variants of the multi-agent particle environments [23]... and the challenging SMAC benchmark [35]... |
| Dataset Splits | No | The paper describes using a replay buffer and mini-batches, and mentions training for a certain number of timesteps. It also uses 'test' win rate, but does not explicitly describe a train/validation/test split for the datasets themselves (e.g., in terms of percentages or counts). |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA GeForce RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Open AI Gym [5]' and 'PyMARL [35] framework' and 'MuJoCo', but it does not specify version numbers for any software dependencies. It mentions 'SC2.4.10.' but that's a version of StarCraft II, not a software dependency for their code. |
| Experiment Setup | Yes | We use Adam optimizer [17]... learning rate 5e-4... replay buffer of size 1e6... batch size 32... hidden layer size 64 for all networks... training for 2 million timesteps... The discount factor γ is 0.99... Target networks are updated using polyak averaging with smoothing coefficient τ = 0.005. |