Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning

Authors: Meng Zhou, Ziyu Liu, Pengwei Sui, Yixuan Li, Yuk Ying Chung

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithm, referred to as LICA, is evaluated on several benchmarks including the multi-agent particle environments and a set of challenging Star Craft II micromanagement tasks, and we show that LICA significantly outperforms previous methods. We benchmark our methods on two sets of cooperative environments, the Multi-Agent Particle Environments [27] and the Star Craft Multi-Agent Challenge [39], and we observe considerable performance improvements over previous state-of-the-art algorithms. We also conduct further component studies to demonstrate that (1) compared to difference reward based credit assignment approaches (e.g. [10]), LICA has higher representational capacity and can readily handle environments where multiple global optima exist, and (2) our adaptive entropy regularization is crucial for encouraging sustained exploration and can lead to faster policy convergence in complex scenarios.
Researcher Affiliation Academia Meng Zhou Ziyu Liu Pengwei Sui Yixuan Li Yuk Ying Chung The University of Sydney
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/mzho7212/LICA.
Open Datasets Yes We first evaluate our algorithm against previous state-of-the-art methods on two common multi-agent particle environments [27]: Predator-Prey and Cooperative Navigation. We thus further evaluate LICA on several SC2 micromanagement tasks from the SMAC [39] benchmark
Dataset Splits No The paper states that agents are "trained for 5000 episodes" for particle environments and describes total steps for StarCraft II scenarios ("32 million steps for Easy... and 64 million for Hard and Super Hard"), but does not provide specific training, validation, and test dataset splits (e.g., percentages or exact counts for each split) in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies Yes our experiments are based on the latest Py MARL framework which uses SC2.4.10 while the original results reported in [39] uses SC2.4.6. As indicated by the original authors, performance is not always comparable across SC2 versions.
Experiment Setup Yes For all environments, agents are trained for 5000 episodes, each has a maximum of 200 steps and may end early for Predator-Prey. Each algorithm is trained with the same 5 random seeds and the mean and standard deviation of the goal metric... All methods also use GRU modules (Fig. 1) and share the parameters of the individual agent networks.