reproducibilityindex.ai

Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration

Authors: Haowen Dou, Lujuan Dang, Zhirong Luan, Badong Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the proposed method outperforms state-of-the-art sequential updating approaches in two challenging multi-agent tasks with various heterogeneous scenarios.
Researcher Affiliation	Academia	1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, 2National Engineering Research Center for Visual Information and Applications, 3Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, 4School of Electrical Engineering, Xi an University of Technology
Pseudocode	Yes	Algorithm 1 Multi-Agent Divergence Policy Optimization
Open Source Code	Yes	Source code is available at https://github.com/hwdou6677/MADPO.
Open Datasets	Yes	We evaluate the proposed MADPO on two challenging multi-agent heterogeneous environments, Multi-Agent Mujoco (MA-Mujoco) [de Witt et al., 2020] and Bi-Dex Hands [Chen et al., 2022].
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits, but rather refers to running experiments on different scenarios and tasks within the environments.
Hardware Specification	Yes	The experiments were conducted on a PC with NVIDIA RTX3090 GPU, Intel Xeon 64-core CPU, and 64GB Ram.
Software Dependencies	No	The paper mentions hyperparameters and implies the use of frameworks like PyTorch (common for this type of research) but does not list specific software dependencies with version numbers.
Experiment Setup	Yes	For MA-Mujoco, the commom hyperparameter are listed in Tab. 1, and the different hyperparameters in each scenarios are listed in Tab. 2. For Bi-Dex Hands, the commom hyperparameter are listed in Tab. 3, and the different hyperparameters in each scenarios are listed in Tab. 4.