Counterfactual Multi-Agent Policy Gradients

Authors: Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate COMA in the testbed of Star Craft unit micromanagement... COMA significantly improves average performance over other multi-agent actorcritic methods in this setting...
Researcher Affiliation Academia Jakob N. Foerster University of Oxford, United Kingdom jakob.foerster@cs.ox.ac.uk Gregory Farquhar University of Oxford, United Kingdom gregory.farquhar@cs.ox.ac.uk Triantafyllos Afouras University of Oxford, UK afourast@robots.ox.ac.uk Nantas Nardelli University of Oxford, UK nantas@robots.ox.ac.uk Shimon Whiteson University of Oxford, UK shimon.whiteson@cs.ox.ac.uk
Pseudocode Yes Pseudocode and further details on the training procedure are in the supplementary material.
Open Source Code No The paper mentions 'Pseudocode and further details on the training procedure are in the supplementary material,' but does not explicitly state that the source code for their methodology is openly available or provide a link to a repository.
Open Datasets No The paper uses Star Craft unit micromanagement as its testbed and mentions Torch Craft for implementation. StarCraft is a commercial game environment, not a publicly available dataset, and the paper does not provide concrete access information for any generated or used dataset.
Dataset Splits No The paper does not explicitly provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it cite predefined splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper states, 'Our implementation uses Torch Craft (Synnaeve et al. 2016) and Torch 7 (Collobert, Kavukcuoglu, and Farabet 2011),' but it does not specify explicit version numbers for these software dependencies or other libraries.
Experiment Setup Yes The actor consists of 128-bit gated recurrent units (GRUs)... We anneal ϵ linearly from 0.5 to 0.02 across 750 training episodes... We found that the most sensitive parameter was TD(λ), but settled on λ = 0.8...