reproducibilityindex.ai

RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization

Authors: Siqi Shen, Chennan Ma, Chao Li, Weiquan Liu, Yongquan Fu, Songzhu Mei, Xinwang Liu, Cheng Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Risk Q can obtain promising performance through extensive experiments. The source code of Risk Q is available in https://github.com/xmu-rl-3dv/Risk Q.
Researcher Affiliation	Academia	a Fujian Key Laboratory of Sensing and Computing for Smart Cities, School of Informatics, Xiamen University (XMU), China b Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China c School of Computer, National University of Defense Technology, China
Pseudocode	Yes	Algorithm 1 The Risk Q Algorithm
Open Source Code	Yes	The source code of Risk Q is available in https://github.com/xmu-rl-3dv/Risk Q.
Open Datasets	Yes	We study the performance of Risk Q on risk-sensitive games (Multi-agent cliff and Car following games), the Star Craft II MARL tasks [16].
Dataset Splits	No	The paper mentions comparing performance and using baselines, but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	Yes	Experiments are carried out on a clusters consists of multiple NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions software components like 'Py MARL framework', 'RMSProp optimizer', 'QR-DQN', and 'TD-lambda learning', but does not provide specific version numbers for these software dependencies (e.g., PyMARL version X.Y, TensorFlow version A.B).
Experiment Setup	Yes	For Risk Q, unless otherwise specified, the following default configuration is adopted: Wang0.75 is used as the risk measurement. QR-DQN is used to model per-agent s stochastic utility, and the quantile number is set to 32. The RMSProp optimizer is employed with a learning rate of 0.001. Batch size and buffer size are set to 32 and 5000, respectively. Risk Q uses TD-lambda learning with λ = 0.6. The ϵ used in ϵ-greedy annealed from 1 to 0.05 within 100K time steps.