Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Authors: Filippos Christianos, Lukas Schäfer, Stefano Albrecht
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms several baselines and state-of-the-art algorithms by learning in fewer steps and converging to higher returns. |
| Researcher Affiliation | Academia | Filippos Christianos School of Informatics University of Edinburgh f.christianos@ed.ac.uk Lukas Schäfer School of Informatics University of Edinburgh l.schaefer@ed.ac.uk Stefano V. Albrecht School of Informatics University of Edinburgh s.albrecht@ed.ac.uk |
| Pseudocode | Yes | Algorithm 1 Shared Experience Actor-Critic Framework |
| Open Source Code | Yes | We provide open-source implementations of SEAC in www.github.com/uoe-agents/seac |
| Open Datasets | Yes | We provide open-source implementations of SEAC in www.github.com/uoe-agents/seac and our two newly developed multi-agent environments: www.github.com/uoe-agents/lb-foraging (LBF) and www.github.com/uoe-agents/robotic-warehouse (RWARE). |
| Dataset Splits | No | The paper mentions "hyperparameter tuning for IAC on RWARE" but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) within the main text. |
| Hardware Specification | No | The paper does not explicitly describe the hardware specifications (e.g., specific GPU/CPU models, memory amounts) used to run the experiments. |
| Software Dependencies | No | The paper discusses algorithms and frameworks like AC, DQN, PyTorch (implicitly through citations), but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | For all tested algorithms, we implement AC using n-step returns and synchronous environments [22]. Specifically, 5-step returns were used and four environments were sampled and passed in batches to the optimiser. An entropy regularisation term was added to the final policy loss [22], computing the entropy of the policy of each individual agent. Hence, the entropy term of agent i, given by H(π(oi t; φi)), only considers its own policy. High computational requirements in terms of environment steps only allowed hyperparameter tuning for IAC on RWARE; all tested AC algorithms use the same hyperparameters (see Appendix B). |