Counterfactual Multi-Agent Policy Gradients
Authors: Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate COMA in the testbed of Star Craft unit micromanagement... COMA significantly improves average performance over other multi-agent actorcritic methods in this setting... |
| Researcher Affiliation | Academia | Jakob N. Foerster University of Oxford, United Kingdom jakob.foerster@cs.ox.ac.uk Gregory Farquhar University of Oxford, United Kingdom gregory.farquhar@cs.ox.ac.uk Triantafyllos Afouras University of Oxford, UK afourast@robots.ox.ac.uk Nantas Nardelli University of Oxford, UK nantas@robots.ox.ac.uk Shimon Whiteson University of Oxford, UK shimon.whiteson@cs.ox.ac.uk |
| Pseudocode | Yes | Pseudocode and further details on the training procedure are in the supplementary material. |
| Open Source Code | No | The paper mentions 'Pseudocode and further details on the training procedure are in the supplementary material,' but does not explicitly state that the source code for their methodology is openly available or provide a link to a repository. |
| Open Datasets | No | The paper uses Star Craft unit micromanagement as its testbed and mentions Torch Craft for implementation. StarCraft is a commercial game environment, not a publicly available dataset, and the paper does not provide concrete access information for any generated or used dataset. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it cite predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper states, 'Our implementation uses Torch Craft (Synnaeve et al. 2016) and Torch 7 (Collobert, Kavukcuoglu, and Farabet 2011),' but it does not specify explicit version numbers for these software dependencies or other libraries. |
| Experiment Setup | Yes | The actor consists of 128-bit gated recurrent units (GRUs)... We anneal ϵ linearly from 0.5 to 0.02 across 750 training episodes... We found that the most sensitive parameter was TD(λ), but settled on λ = 0.8... |