Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning
Authors: Kyunghwan Son, Junsu Kim, Sungsoo Ahn, Roben D Delos Reyes, Yung Yi, Jinwoo Shin
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that DRIMA significantly outperforms prior state-of-the-art methods across various scenarios in the Star Craft Multiagent Challenge environment. Notably, DRIMA shows robust performance where prior methods learn only a highly suboptimal policy, regardless of reward shaping, exploration scheduling, and noisy (random or adversarial) agents. |
| Researcher Affiliation | Academia | Kyunghwan Son 1 Junsu Kim 1 Sungsoo Ahn 2 Roben Delos Reyes 1 Yung Yi 1 Jinwoo Shin 1 1Korea Advanced Institute of Science and Technology (KAIST) 2Pohang University of Science and Technology (POSTECH). Correspondence to: Kyunghwan Son <kevinson9473@kaist.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 DRIMA algorithm |
| Open Source Code | No | The paper provides links to repositories for external tools and baselines (e.g., SMAC, PyMARL, WQMIX, QPLEX, DFAC), but does not provide a specific link or explicit statement about releasing the source code for the DRIMA methodology itself. |
| Open Datasets | Yes | Environments. We mainly evaluate our method on the Starcraft Multi-Agent Challenge (SMAC) environment (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper mentions a replay buffer size and mini-batch size for training, and specifies test episodes, but it does not provide explicit train/validation/test dataset splits (e.g., percentages or exact counts for each split). |
| Hardware Specification | Yes | Using a Nvidia Titan Xp graphic card, the training time varies from 8 hours to 24 hours for different scenarios. |
| Software Dependencies | Yes | The hyperparameters of training and testing configurations for VDN, QMIX, and QTRAN are the same as in the recent Git Hub code of SMAC 3 (Samvelyan et al., 2019) and Py MARL 4 with Star Craft version SC2.4.6.2.69232 |
| Experiment Setup | Yes | We used the Adam optimizer. For other methods except for DRIMA and DFAC, according to their papers, all neural networks are trained using the RMSProp optimizer with a 0.0005 learning rate. We use ϵ-greedy action selection with decreasing ϵ from 1 to 0.05 for exploration, following Samvelyan et al. (2019). For the discount factor, we set γ = 0.99. The replay buffer stores 5000 episodes at most, and the mini-batch is 32. [...] We set λopt = 3 and λnopt, λub = 1. |