Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game
Authors: Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on matrix game, Level-based Foraging and Star Craft II indicate that, our method successfully acquires intricate micromanagement skills and adaptively aligns with allies under worst-case perturbations, showing resilience against non-oblivious adversaries, random allies, observation-based attacks, and transfer-based attacks. |
| Researcher Affiliation | Academia | 1SKLSDE Lab, Beihang University, China 2Zhongguancun Laboratory, China 3Institute of Artificial Intelligence, Peking University & Big AI, China 4Institute of data space, Hefei Comprehensive National Science Center, China |
| Pseudocode | Yes | See the pseudo-code for our algorithm in Appendix. B. |
| Open Source Code | Yes | Code and demo videos available at https: //github.com/DIG-Beihang/EIR-MAPPO. |
| Open Datasets | Yes | Environments include a toy iterative matrix game proposed by (Han et al., 2022), rewarding XNOR or XOR actions at different state, 12x12-4p-3f-c of Level-Based Foraging (LBF) (Papoudakis et al., 2020) and 4m vs 3m of the Star Craft Multi-agent Challenge (SMAC) (Samvelyan et al., 2019) |
| Dataset Splits | No | The paper describes training and evaluation phases within the reinforcement learning environments, but it does not specify explicit training/validation/test dataset splits as data is generated interactively within the environments. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using MAPPO, PPO, and GRU, and references a codebase, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Next, we present all hyperparameters of each environment in the table below. These hyparameters follows the default in previous papers, including MAPPO (Yu et al., 2021), HARL (Zhong et al., 2023) and FACMAC (Peng et al., 2021). Table 1: Hyperparameters for MAPPO, RMAAC, EAR-MAPPO, EIR-MAPPO in toy environment. Hyperparameter Value rollouts 10 mini-batch num 1 PPO epoch 5 gamma 0.99 max grad norm 10 PPO clip 0.05 gain 0.01 max episode len 200 entropy coef 0.01 actor network MLP actor lr 5e-5 eval episode 32 hidden dim 128 critic lr 5e-5 optimizer Adam belief network GRU adversary lr 5e-5 Huber loss True use Pop Art True belief lr 5e-5 Huber delta 10 adversary interval 10 GAE lambda 0.95 RMAAC ϵ 0.05 |