UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning
Authors: Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Boehmer, Shimon Whiteson
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and Star Craft II micromanagement benchmarks show that Une VEn can solve tasks where other state-of-the-art MARL methods fail. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Oxford, Oxford, United Kingdom 2Department of Software Technology, Delft University of Technology, Delft, Netherlands. |
| Pseudocode | No | The main text of the paper refers to "Appendix B" for a detailed algorithm, but Appendix B itself is not included in the provided text. |
| Open Source Code | No | The paper mentions videos of learnt policies available at a URL (https://sites.google.com/view/uneven-marl/) but does not provide a statement or link for the open-source code of their methodology. |
| Open Datasets | Yes | We now evaluate Une VEn on challenging cooperative Star Craft II (SC2) maps from the popular SMAC benchmark (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper describes training duration in steps and testing with rollouts (e.g., 'training for 35k steps', 'test 60 rollouts') within simulation environments, but does not provide traditional train/validation/test dataset splits. |
| Hardware Specification | No | The paper acknowledges 'a generous equipment grant from NVIDIA' but does not specify any particular GPU models, CPU models, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions various algorithms and benchmarks (e.g., VDN, QMIX, Star Craft II), but it does not specify software names with their version numbers required for replication. |
| Experiment Setup | Yes | α is annealed from 0.3 to 1.0 in our experiments over a fixed number of steps at the beginning of training. Once this exploration stage is finished (i.e., α = 1), actions are always taken based on the target task s joint action value function. |