reproducibilityindex.ai

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game

Authors: Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on matrix game, Level-based Foraging and Star Craft II indicate that, our method successfully acquires intricate micromanagement skills and adaptively aligns with allies under worst-case perturbations, showing resilience against non-oblivious adversaries, random allies, observation-based attacks, and transfer-based attacks.
Researcher Affiliation	Academia	1SKLSDE Lab, Beihang University, China 2Zhongguancun Laboratory, China 3Institute of Artificial Intelligence, Peking University & Big AI, China 4Institute of data space, Hefei Comprehensive National Science Center, China
Pseudocode	Yes	See the pseudo-code for our algorithm in Appendix. B.
Open Source Code	Yes	Code and demo videos available at https: //github.com/DIG-Beihang/EIR-MAPPO.
Open Datasets	Yes	Environments include a toy iterative matrix game proposed by (Han et al., 2022), rewarding XNOR or XOR actions at different state, 12x12-4p-3f-c of Level-Based Foraging (LBF) (Papoudakis et al., 2020) and 4m vs 3m of the Star Craft Multi-agent Challenge (SMAC) (Samvelyan et al., 2019)
Dataset Splits	No	The paper describes training and evaluation phases within the reinforcement learning environments, but it does not specify explicit training/validation/test dataset splits as data is generated interactively within the environments.
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions using MAPPO, PPO, and GRU, and references a codebase, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup	Yes	Next, we present all hyperparameters of each environment in the table below. These hyparameters follows the default in previous papers, including MAPPO (Yu et al., 2021), HARL (Zhong et al., 2023) and FACMAC (Peng et al., 2021). Table 1: Hyperparameters for MAPPO, RMAAC, EAR-MAPPO, EIR-MAPPO in toy environment. Hyperparameter Value rollouts 10 mini-batch num 1 PPO epoch 5 gamma 0.99 max grad norm 10 PPO clip 0.05 gain 0.01 max episode len 200 entropy coef 0.01 actor network MLP actor lr 5e-5 eval episode 32 hidden dim 128 critic lr 5e-5 optimizer Adam belief network GRU adversary lr 5e-5 Huber loss True use Pop Art True belief lr 5e-5 Huber delta 10 adversary interval 10 GAE lambda 0.95 RMAAC ϵ 0.05