reproducibilityindex.ai

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Authors: Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Boehmer, Shimon Whiteson

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and Star Craft II micromanagement benchmarks show that Une VEn can solve tasks where other state-of-the-art MARL methods fail.
Researcher Affiliation	Academia	1Department of Computer Science, University of Oxford, Oxford, United Kingdom 2Department of Software Technology, Delft University of Technology, Delft, Netherlands.
Pseudocode	No	The main text of the paper refers to "Appendix B" for a detailed algorithm, but Appendix B itself is not included in the provided text.
Open Source Code	No	The paper mentions videos of learnt policies available at a URL (https://sites.google.com/view/uneven-marl/) but does not provide a statement or link for the open-source code of their methodology.
Open Datasets	Yes	We now evaluate Une VEn on challenging cooperative Star Craft II (SC2) maps from the popular SMAC benchmark (Samvelyan et al., 2019).
Dataset Splits	No	The paper describes training duration in steps and testing with rollouts (e.g., 'training for 35k steps', 'test 60 rollouts') within simulation environments, but does not provide traditional train/validation/test dataset splits.
Hardware Specification	No	The paper acknowledges 'a generous equipment grant from NVIDIA' but does not specify any particular GPU models, CPU models, or other hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions various algorithms and benchmarks (e.g., VDN, QMIX, Star Craft II), but it does not specify software names with their version numbers required for replication.
Experiment Setup	Yes	α is annealed from 0.3 to 1.0 in our experiments over a ﬁxed number of steps at the beginning of training. Once this exploration stage is ﬁnished (i.e., α = 1), actions are always taken based on the target task s joint action value function.