Optimistic Multi-Agent Policy Gradient
Authors: Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive evaluations on a diverse set of tasks including the Multi-agent Mu Jo Co and Overcooked benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Automation, Aalto University, Finland 2School of Computer Science and Engineering, University of Electronic Science and Technology of China, China 3Department of Computer Science, Aalto University, Finland 4University of Oulu, Finland. |
| Pseudocode | Yes | Algorithm 1 Optimistic Multi-Agent Proximal Policy Optimization (Opti MAPPO) |
| Open Source Code | Yes | Source Code: https://github.com/wenshuaizhao/optimappo |
| Open Datasets | Yes | In extensive evaluations on a diverse set of tasks including the Multi-agent Mu Jo Co and Overcooked benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest. |
| Dataset Splits | No | The paper mentions '100 evaluation episodes' for MA-Mu Jo Co and 'episode length of repeated games as 25' for matrix games, and 'Episode Length 400' for Overcooked tasks. However, it does not explicitly provide percentages or counts for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions 'computational resources provided by the Aalto Science-IT project and CSC, Finnish IT Center for Science', but it does not specify any exact GPU or CPU models, memory amounts, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components or libraries, such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use the same hyperparameters listed in Table 5. The implementation is based on the HAPPO (Kuba et al., 2022) codebase, and the other hyperparameters are the default. ... In all the tasks of Overcooked, we use the same hyperparameters listed in Table 6. |