reproducibilityindex.ai

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Authors: Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that even with state-of-the-art exploration techniques, PG fails to discover the risky cooperation strategies. In contrast, RPG discovers a surprisingly diverse set of human-interpretable strategies in all these games, including some non-trivial emergent behavior.
Researcher Affiliation	Academia	1 Tsinghua University, 2 Shanghai Qi Zhi Institute, 3 UC Berkeley, 4 UCSD, 5 CMU, 6 Peking University, 7 University of Washington
Pseudocode	Yes	Algorithm 1: RPG: Reward-Randomized Policy Gradient
Open Source Code	Yes	The source code and example videos can be found in our website: https://sites.google. com/view/staghuntrpg.
Open Datasets	Yes	A new multi-agent environment Agar.io, which allows complex multi-agent strategic behavior. We released the environment to the community as a novel testbed for MARL research. [and] We consider two games adapted from Peysakhovich & Lerer (2018b), Monster-Hunt and Escalation.
Dataset Splits	No	The paper mentions 'evaluation results are averaged over 100 episodes in gridworlds and 1000 episodes in Agar.io' and 'We repeat all the experiments with 3 seeds', implying testing. However, it does not explicitly define distinct 'training', 'validation', and 'test' dataset splits with specific percentages or counts.
Hardware Specification	No	The paper describes the training process using PPO, Adam optimizer, and GRU modules, but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like PPO, Adam optimizer, and GRU but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	More optimization hyper-parameter settings are in Tab.6. In addition, Monster-Hunt also utilizes GRU modules to infer opponent s identity during adaption training and the parallel threads are set to 64. [and] More optimization hyper-parameter settings are in Tab.7.