Optimistic Multi-Agent Policy Gradient

Authors: Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive evaluations on a diverse set of tasks including the Multi-agent Mu Jo Co and Overcooked benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest.
Researcher Affiliation Academia 1Department of Electrical Engineering and Automation, Aalto University, Finland 2School of Computer Science and Engineering, University of Electronic Science and Technology of China, China 3Department of Computer Science, Aalto University, Finland 4University of Oulu, Finland.
Pseudocode Yes Algorithm 1 Optimistic Multi-Agent Proximal Policy Optimization (Opti MAPPO)
Open Source Code Yes Source Code: https://github.com/wenshuaizhao/optimappo
Open Datasets Yes In extensive evaluations on a diverse set of tasks including the Multi-agent Mu Jo Co and Overcooked benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest.
Dataset Splits No The paper mentions '100 evaluation episodes' for MA-Mu Jo Co and 'episode length of repeated games as 25' for matrix games, and 'Episode Length 400' for Overcooked tasks. However, it does not explicitly provide percentages or counts for training, validation, or test dataset splits.
Hardware Specification No The paper mentions 'computational resources provided by the Aalto Science-IT project and CSC, Finnish IT Center for Science', but it does not specify any exact GPU or CPU models, memory amounts, or detailed computer specifications used for running experiments.
Software Dependencies No The paper does not provide specific version numbers for software components or libraries, such as Python, PyTorch, or CUDA.
Experiment Setup Yes We use the same hyperparameters listed in Table 5. The implementation is based on the HAPPO (Kuba et al., 2022) codebase, and the other hyperparameters are the default. ... In all the tasks of Overcooked, we use the same hyperparameters listed in Table 6.