Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning

Authors: Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Changjie Fan, Fei Wu, Jun Xiao

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of the deconfounded training, we apply our approach to three popular value decomposition baselines, including QMIX (Rashid et al., 2018), QPLEX (Wang et al., 2021a) and RODE (Wang et al., 2021b). We show that each baseline enjoys a significant improvement. We carry out the experiments with different scenarios on two benchmarks, Star Craft II micro management challenge (SMAC) (Samvelyan et al., 2019) and multi-agent coordination challenge (MACO) (Wang et al., 2022).
Researcher Affiliation Collaboration 1DCD Lab, College of Computer Science, Zhejiang University. 2The Chinese University of Hong Kong, Shenzhen 3Shenzhen Institute of Artificial Intelligence and Robotics for Society 4Huawei Noah s Ark Lab 5Fuxi AI Lab, Net Ease Games 6Shanghai Institute for Advanced Study of Zhejiang University 7Shanghai AI Laboratory.
Pseudocode Yes Algorithm 1 Deconfounded Value Decomposition
Open Source Code No The paper states: 'We implement these baselines and corresponding deconfouned training via Py MARL (Samvelyan et al., 2019).' This refers to their implementation within a framework but does not explicitly state that the authors' own source code for the DVD method is open-sourced or provide a link to it.
Open Datasets Yes We carry out the experiments with different scenarios on two benchmarks, Star Craft II micro management challenge (SMAC) (Samvelyan et al., 2019) and multi-agent coordination challenge (MACO) (Wang et al., 2022).
Dataset Splits No The paper specifies training parameters like 'Batch size 32' and 'Replay buffer size 5000' and 'Evaluation interval 10000 time steps', but does not explicitly provide specific percentages or counts for training, validation, and test dataset splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions implementing baselines via 'Py MARL (Samvelyan et al., 2019)', but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes The experiment configurations are shown in Table 3. The other hyper-parameters of the baselines are set the same as that in SMAC. ... Table 3. Common settings of different methods. Settings MACO Star Craft II Batch size 32 32 Exploration time steps 500000 50000 (500000 for super hard maps) Start exploration rate 1 1 End exploration rate 0.05 0.05 TD-loss discount 0.9 0.9 Target central critic update interval 200 episodes 200 episodes Evaluation interval 10000 time steps 10000 time steps Evaluation battle number 300 episodes 32 episodes Learning rate 0.0005 0.0005 Optimizer RMSProp RMSProp (Adam for Corridor and 6h vs 8z) Sampling Times D 4 8