RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization
Authors: Siqi Shen, Chennan Ma, Chao Li, Weiquan Liu, Yongquan Fu, Songzhu Mei, Xinwang Liu, Cheng Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Risk Q can obtain promising performance through extensive experiments. The source code of Risk Q is available in https://github.com/xmu-rl-3dv/Risk Q. |
| Researcher Affiliation | Academia | a Fujian Key Laboratory of Sensing and Computing for Smart Cities, School of Informatics, Xiamen University (XMU), China b Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China c School of Computer, National University of Defense Technology, China |
| Pseudocode | Yes | Algorithm 1 The Risk Q Algorithm |
| Open Source Code | Yes | The source code of Risk Q is available in https://github.com/xmu-rl-3dv/Risk Q. |
| Open Datasets | Yes | We study the performance of Risk Q on risk-sensitive games (Multi-agent cliff and Car following games), the Star Craft II MARL tasks [16]. |
| Dataset Splits | No | The paper mentions comparing performance and using baselines, but does not specify explicit training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | Experiments are carried out on a clusters consists of multiple NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Py MARL framework', 'RMSProp optimizer', 'QR-DQN', and 'TD-lambda learning', but does not provide specific version numbers for these software dependencies (e.g., PyMARL version X.Y, TensorFlow version A.B). |
| Experiment Setup | Yes | For Risk Q, unless otherwise specified, the following default configuration is adopted: Wang0.75 is used as the risk measurement. QR-DQN is used to model per-agent s stochastic utility, and the quantile number is set to 32. The RMSProp optimizer is employed with a learning rate of 0.001. Batch size and buffer size are set to 32 and 5000, respectively. Risk Q uses TD-lambda learning with λ = 0.6. The ϵ used in ϵ-greedy annealed from 1 to 0.05 within 100K time steps. |