reproducibilityindex.ai

Mutual-Information Regularized Multi-Agent Policy Iteration

Authors: Wang, Deheng Ye, Zongqing Lu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our method demonstrates strong zero-shot generalization to dynamic team compositions in complex cooperative tasks. To empirically justify our algorithm, we first evaluate the performance of MIPI in a simple yet challenging matrix game. Then, we move to a more complicated scenario, Star Craft Micromanagement Tasks.
Researcher Affiliation	Collaboration	Jiangxing Wang School of Computer Science Peking University jiangxiw@stu.pku.edu.cn Deheng Ye Tencent Inc. dericye@tencent.com Zongqing Lu School of Computer Science Peking University BAAI zongqing.lu@pku.edu.cn
Pseudocode	No	The paper describes algorithms and methods mathematically and in text but does not include a formal 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper states: 'Our implementation of MIPI is based on REFIL (Iqbal et al., 2021) with MIT license.' This indicates that REFIL is open source, but does not explicitly state that the authors' MIPI code itself is open source or provide a link to it.
Open Datasets	Yes	We first evaluate the performance of MIPI in a simple yet challenging matrix game. Then, we move to a more complicated scenario, Star Craft Micromanagement Tasks (SMAC) (Samvelyan et al., 2019)... In this section, We further evaluate MIPI on Resource Collection, which is a more challenging scenario in terms of the level of collaboration used by COPA (Liu et al., 2021).
Dataset Splits	Yes	During training, we train these two agents under team compositions (A, B) and (B, A)... During training, the maps randomly initialize 3-5 agents and the same number of enemies at the start of each episode.
Hardware Specification	Yes	All models are built by Py Torch and are trained via 1 Nvidia RTX 1060 GPU to conduct all the experiments. ... All models are built by Py Torch and are trained via a mixture of 4 Nvidia A100, 4 RTX 3090, and 1 RTX 2080 TI GPUs to conduct all the experiments. ... All models are built by Py Torch and are trained via 4 Nvidia RTX 3090 GPUs to conduct all the experiments.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number for it or for any other software dependencies.
Experiment Setup	Yes	In the matrix game, we use a learning rate of 3 10 4 for all algorithms. For the algorithm that uses mutual information as the augmented reward, we set the number of Blahut Arimoto iterations to 1. For algorithms that use mutual information and entropy as the augmented reward, we fix α as 0.5. The batch size used in the experiment is 64.