reproducibilityindex.ai

Multi-Agent First Order Constrained Optimization in Policy Space

Authors: Youpeng Zhao, Yaodong Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our approach achieves remarkable performance while satisfying safe constraints on several safe MARL benchmarks. We evaluate the effectiveness of our algorithm on two benchmarks of safe MARL: Safe MAMu Jo Co and Safe Multi-Agent Isaac Gym (MAIG).
Researcher Affiliation	Academia	Youpeng Zhao1, Yaodong Yang2 , Zhenbo Lu3 , Wengang Zhou1,3, Houqiang Li1,3 1University of Science and Technology of China, 2Institute for AI, Peking University 3Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Pseudocode	Yes	The procedure of our algorithm is presented in the Appendix E. Algorithm 1 MAFOCOPS
Open Source Code	No	No concrete statement or link for the open-source code of the described methodology (MAFOCOPS) is provided. The paper mentions using "the MACPO codebase" but does not specify their own code release.
Open Datasets	Yes	We evaluate the effectiveness of our algorithm on two benchmarks of safe MARL: Safe MAMu Jo Co and Safe Multi-Agent Isaac Gym (MAIG). The former is a safety-aware modification of MAMu Jo Co [38], where there exist obstacles in the environment. Meanwhile, Safe MAIG is developed on top of Issac Gym [39], a GPU-based platform for robotics tasks. Being an extension of Dexterour Hands [40], Safe MAIG requires agents to control the robot hands while optimizing both the reward and safety performance.
Dataset Splits	No	No explicit training/validation/test dataset splits are provided. The paper discusses how "cost thresholds are determined by taking 50% of the cost achieved by standard MARL algorithms after 1 million sample runs" and "cost thresholds are set as 25% of the cost obtained by standard MARL algorithms after running for one-tenth of the entire training process", which relates to experiment setup but not data splitting.
Hardware Specification	Yes	Both sets of experiments are carried out using the MACPO codebase and our experiments are conducted on Ge Force RTX 3090 GPUS.
Software Dependencies	No	No specific software dependencies with version numbers are listed. The paper states, "As our implementation is based on the codebase provided by MACPO [24]", but does not detail specific software versions beyond that.
Experiment Setup	Yes	More implementation details can be found in supplementary materials. ... For MAFOCOPS, the Lagrange multipliers, namely λ and νmax, we utilize are 2.2 and 1.3, respectively, which can found in Table 3 and 4. ... We present the specific hyperparameters that we use in our experiments in Table 5 (as most parameters are unchanged, we only report the changed ones or unique parameters in our algorithm). Table 5: Different hyperparameters used for MACPO, MAPPO-L and MAFOCOPS.