Multi-Agent First Order Constrained Optimization in Policy Space
Authors: Youpeng Zhao, Yaodong Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our approach achieves remarkable performance while satisfying safe constraints on several safe MARL benchmarks. We evaluate the effectiveness of our algorithm on two benchmarks of safe MARL: Safe MAMu Jo Co and Safe Multi-Agent Isaac Gym (MAIG). |
| Researcher Affiliation | Academia | Youpeng Zhao1, Yaodong Yang2 , Zhenbo Lu3 , Wengang Zhou1,3, Houqiang Li1,3 1University of Science and Technology of China, 2Institute for AI, Peking University 3Institute of Artificial Intelligence, Hefei Comprehensive National Science Center |
| Pseudocode | Yes | The procedure of our algorithm is presented in the Appendix E. Algorithm 1 MAFOCOPS |
| Open Source Code | No | No concrete statement or link for the open-source code of the described methodology (MAFOCOPS) is provided. The paper mentions using "the MACPO codebase" but does not specify their own code release. |
| Open Datasets | Yes | We evaluate the effectiveness of our algorithm on two benchmarks of safe MARL: Safe MAMu Jo Co and Safe Multi-Agent Isaac Gym (MAIG). The former is a safety-aware modification of MAMu Jo Co [38], where there exist obstacles in the environment. Meanwhile, Safe MAIG is developed on top of Issac Gym [39], a GPU-based platform for robotics tasks. Being an extension of Dexterour Hands [40], Safe MAIG requires agents to control the robot hands while optimizing both the reward and safety performance. |
| Dataset Splits | No | No explicit training/validation/test dataset splits are provided. The paper discusses how "cost thresholds are determined by taking 50% of the cost achieved by standard MARL algorithms after 1 million sample runs" and "cost thresholds are set as 25% of the cost obtained by standard MARL algorithms after running for one-tenth of the entire training process", which relates to experiment setup but not data splitting. |
| Hardware Specification | Yes | Both sets of experiments are carried out using the MACPO codebase and our experiments are conducted on Ge Force RTX 3090 GPUS. |
| Software Dependencies | No | No specific software dependencies with version numbers are listed. The paper states, "As our implementation is based on the codebase provided by MACPO [24]", but does not detail specific software versions beyond that. |
| Experiment Setup | Yes | More implementation details can be found in supplementary materials. ... For MAFOCOPS, the Lagrange multipliers, namely λ and νmax, we utilize are 2.2 and 1.3, respectively, which can found in Table 3 and 4. ... We present the specific hyperparameters that we use in our experiments in Table 5 (as most parameters are unchanged, we only report the changed ones or unique parameters in our algorithm). Table 5: Different hyperparameters used for MACPO, MAPPO-L and MAFOCOPS. |