reproducibilityindex.ai

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

Authors: Lijun Zhang, Lin Li, Wei Wei, Huizhong Song, Yaodong Yang, Jiye Liang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our method via several numerical experiments. Our experiments aim to answer the following questions: First, how does the cost and reward performance of Scal-MAPPO-L compare with existing methods on challenging multi-agent safe tasks? Second, how does the different κ affect the performance of Scal-MAPPO-L, and could the advantage truncation effectively alleviate computational load?
Researcher Affiliation	Academia	Lijun Zhang1, Lin Li1, Wei Wei1 , Huizhong Song1, Yaodong Yang2, Jiye Liang1 1. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi, China. 2. Institute for AI, Peking University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Scalable MAPPO-Lagrangian
Open Source Code	Yes	The paper provide open access to the data and code in supplemental material.
Open Datasets	Yes	Safe MAMu Jo Co [10] is an extension of MAMu Jo Co [19], which preserves the agents, physics simulator, background environment, and reward function and comes with obstacles, like walls or pitfalls.
Dataset Splits	No	The paper does not explicitly mention validation splits or percentages for the datasets used in experiments.
Hardware Specification	Yes	We executed our code on a computer with NVIDIA Ge Force RTX 4090 (GPU) and Intel Core i9-13900K (CPU).
Software Dependencies	No	The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We train Scal-MAPPO-L with the same network architecture and hyperparameters as the original MAPPO-L implementation.