RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Authors: Wei Qiu, Xinrun Wang, Runsheng Yu, Rundong Wang, Xu He, Bo An, Svetlana Obraztsova, Zinovi Rabinovich

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our method outperforms many state-of-the-art methods on various multi-agent risk-sensitive navigation scenarios and challenging Star Craft II cooperative tasks, demonstrating enhanced coordination and revealing improved sample efficiency.
Researcher Affiliation Academia Wei Qiu 1, Xinrun Wang1, Runsheng Yu2, Xu He1, Rundong Wang1, Bo An1, Svetlana Obraztsova1, Zinovi Rabinovich1 1Nanyang Technological University, Singapore 2Hong Kong University of Science and Technology, Hong Kong
Pseudocode Yes Algorithm 1: RMIX
Open Source Code Yes Our code can be found at this link: https: //github.com/yetanotherpolicy/rmix.
Open Datasets Yes MACN. We customize the cliff walking environment [44] in single-agent domain and develop Multi Agent Cliff Navigation (MACN) for multi-agent risksensitive navigation. SC II. SMAC [39] is a challenging set of cooperative SCII maps for micromanagement MARL research.
Dataset Splits No The paper mentions using specific environments (MACN, SMAC) and running experiments with multiple random seeds, but it does not specify explicit train, validation, or test dataset splits (e.g., percentages or counts of data points for each split).
Hardware Specification Yes We carry out experiments on NVIDIA Tesla V100 GPU 16G and NVIDIA Ge Force RTX 3090 24G.
Software Dependencies No The paper mentions 'We implement our method on Py MARL [39]' but does not provide specific version numbers for PyMARL or any other software dependencies, such as Python, PyTorch/TensorFlow, or CUDA versions.
Experiment Setup No The paper mentions 'We use 5 random seeds to train each method' and refers to 'Appendix C' for 'More training details', implying that specific experimental setup details such as concrete hyperparameter values or detailed training configurations are not fully present in the main text.