Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization
Authors: Siqi Shen, Chennan Ma, Chao Li, Weiquan Liu, Yongquan Fu, Songzhu Mei, Xinwang Liu, Cheng Wang
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Risk Q can obtain promising performance through extensive experiments. The source code of Risk Q is available in https://github.com/xmu-rl-3dv/Risk Q. |
| Researcher Affiliation | Academia | a Fujian Key Laboratory of Sensing and Computing for Smart Cities, School of Informatics, Xiamen University (XMU), China b Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China c School of Computer, National University of Defense Technology, China |
| Pseudocode | Yes | Algorithm 1 The Risk Q Algorithm |
| Open Source Code | Yes | The source code of Risk Q is available in https://github.com/xmu-rl-3dv/Risk Q. |
| Open Datasets | Yes | We study the performance of Risk Q on risk-sensitive games (Multi-agent cliff and Car following games), the Star Craft II MARL tasks [16]. |
| Dataset Splits | No | The paper mentions comparing performance and using baselines, but does not specify explicit training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | Experiments are carried out on a clusters consists of multiple NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Py MARL framework', 'RMSProp optimizer', 'QR-DQN', and 'TD-lambda learning', but does not provide specific version numbers for these software dependencies (e.g., PyMARL version X.Y, TensorFlow version A.B). |
| Experiment Setup | Yes | For Risk Q, unless otherwise specified, the following default configuration is adopted: Wang0.75 is used as the risk measurement. QR-DQN is used to model per-agent s stochastic utility, and the quantile number is set to 32. The RMSProp optimizer is employed with a learning rate of 0.001. Batch size and buffer size are set to 32 and 5000, respectively. Risk Q uses TD-lambda learning with λ = 0.6. The ϵ used in ϵ-greedy annealed from 1 to 0.05 within 100K time steps. |