Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration

Authors: Haowen Dou, Lujuan Dang, Zhirong Luan, Badong Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed method outperforms state-of-the-art sequential updating approaches in two challenging multi-agent tasks with various heterogeneous scenarios.
Researcher Affiliation Academia 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, 2National Engineering Research Center for Visual Information and Applications, 3Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, 4School of Electrical Engineering, Xi an University of Technology
Pseudocode Yes Algorithm 1 Multi-Agent Divergence Policy Optimization
Open Source Code Yes Source code is available at https://github.com/hwdou6677/MADPO.
Open Datasets Yes We evaluate the proposed MADPO on two challenging multi-agent heterogeneous environments, Multi-Agent Mujoco (MA-Mujoco) [de Witt et al., 2020] and Bi-Dex Hands [Chen et al., 2022].
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, but rather refers to running experiments on different scenarios and tasks within the environments.
Hardware Specification Yes The experiments were conducted on a PC with NVIDIA RTX3090 GPU, Intel Xeon 64-core CPU, and 64GB Ram.
Software Dependencies No The paper mentions hyperparameters and implies the use of frameworks like PyTorch (common for this type of research) but does not list specific software dependencies with version numbers.
Experiment Setup Yes For MA-Mujoco, the commom hyperparameter are listed in Tab. 1, and the different hyperparameters in each scenarios are listed in Tab. 2. For Bi-Dex Hands, the commom hyperparameter are listed in Tab. 3, and the different hyperparameters in each scenarios are listed in Tab. 4.