Mutual-Information Regularized Multi-Agent Policy Iteration
Authors: Wang, Deheng Ye, Zongqing Lu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our method demonstrates strong zero-shot generalization to dynamic team compositions in complex cooperative tasks. To empirically justify our algorithm, we first evaluate the performance of MIPI in a simple yet challenging matrix game. Then, we move to a more complicated scenario, Star Craft Micromanagement Tasks. |
| Researcher Affiliation | Collaboration | Jiangxing Wang School of Computer Science Peking University jiangxiw@stu.pku.edu.cn Deheng Ye Tencent Inc. dericye@tencent.com Zongqing Lu School of Computer Science Peking University BAAI zongqing.lu@pku.edu.cn |
| Pseudocode | No | The paper describes algorithms and methods mathematically and in text but does not include a formal 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper states: 'Our implementation of MIPI is based on REFIL (Iqbal et al., 2021) with MIT license.' This indicates that REFIL is open source, but does not explicitly state that the authors' MIPI code itself is open source or provide a link to it. |
| Open Datasets | Yes | We first evaluate the performance of MIPI in a simple yet challenging matrix game. Then, we move to a more complicated scenario, Star Craft Micromanagement Tasks (SMAC) (Samvelyan et al., 2019)... In this section, We further evaluate MIPI on Resource Collection, which is a more challenging scenario in terms of the level of collaboration used by COPA (Liu et al., 2021). |
| Dataset Splits | Yes | During training, we train these two agents under team compositions (A, B) and (B, A)... During training, the maps randomly initialize 3-5 agents and the same number of enemies at the start of each episode. |
| Hardware Specification | Yes | All models are built by Py Torch and are trained via 1 Nvidia RTX 1060 GPU to conduct all the experiments. ... All models are built by Py Torch and are trained via a mixture of 4 Nvidia A100, 4 RTX 3090, and 1 RTX 2080 TI GPUs to conduct all the experiments. ... All models are built by Py Torch and are trained via 4 Nvidia RTX 3090 GPUs to conduct all the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for it or for any other software dependencies. |
| Experiment Setup | Yes | In the matrix game, we use a learning rate of 3 10 4 for all algorithms. For the algorithm that uses mutual information as the augmented reward, we set the number of Blahut Arimoto iterations to 1. For algorithms that use mutual information and entropy as the augmented reward, we fix α as 0.5. The batch size used in the experiment is 64. |