Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Counterfactual Multi-Agent Policy Gradients
Authors: Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate COMA in the testbed of Star Craft unit micromanagement... COMA significantly improves average performance over other multi-agent actorcritic methods in this setting... |
| Researcher Affiliation | Academia | Jakob N. Foerster University of Oxford, United Kingdom EMAIL Gregory Farquhar University of Oxford, United Kingdom EMAIL Triantafyllos Afouras University of Oxford, UK EMAIL Nantas Nardelli University of Oxford, UK EMAIL Shimon Whiteson University of Oxford, UK EMAIL |
| Pseudocode | Yes | Pseudocode and further details on the training procedure are in the supplementary material. |
| Open Source Code | No | The paper mentions 'Pseudocode and further details on the training procedure are in the supplementary material,' but does not explicitly state that the source code for their methodology is openly available or provide a link to a repository. |
| Open Datasets | No | The paper uses Star Craft unit micromanagement as its testbed and mentions Torch Craft for implementation. StarCraft is a commercial game environment, not a publicly available dataset, and the paper does not provide concrete access information for any generated or used dataset. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it cite predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper states, 'Our implementation uses Torch Craft (Synnaeve et al. 2016) and Torch 7 (Collobert, Kavukcuoglu, and Farabet 2011),' but it does not specify explicit version numbers for these software dependencies or other libraries. |
| Experiment Setup | Yes | The actor consists of 128-bit gated recurrent units (GRUs)... We anneal ϵ linearly from 0.5 to 0.02 across 750 training episodes... We found that the most sensitive parameter was TD(λ), but settled on λ = 0.8... |