Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

Authors: Lijun Zhang, Lin Li, Wei Wei, Huizhong Song, Yaodong Yang, Jiye Liang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our method via several numerical experiments. Our experiments aim to answer the following questions: First, how does the cost and reward performance of Scal-MAPPO-L compare with existing methods on challenging multi-agent safe tasks? Second, how does the different κ affect the performance of Scal-MAPPO-L, and could the advantage truncation effectively alleviate computational load?
Researcher Affiliation Academia Lijun Zhang1, Lin Li1, Wei Wei1 , Huizhong Song1, Yaodong Yang2, Jiye Liang1 1. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, China. 2. Institute for AI, Peking University, Beijing, China.
Pseudocode Yes Algorithm 1 Scalable MAPPO-Lagrangian
Open Source Code Yes The paper provide open access to the data and code in supplemental material.
Open Datasets Yes Safe MAMu Jo Co [10] is an extension of MAMu Jo Co [19], which preserves the agents, physics simulator, background environment, and reward function and comes with obstacles, like walls or pitfalls.
Dataset Splits No The paper does not explicitly mention validation splits or percentages for the datasets used in experiments.
Hardware Specification Yes We executed our code on a computer with NVIDIA Ge Force RTX 4090 (GPU) and Intel Core i9-13900K (CPU).
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We train Scal-MAPPO-L with the same network architecture and hyperparameters as the original MAPPO-L implementation.