Conservative Offline Policy Adaptation in Multi-Agent Games
Authors: Chengjie Wu, Pingzhong Tang, Jun Yang, Yujing Hu, Tangjie Lv, Changjie Fan, Chongjie Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that CSP outperforms non-conservative baselines in various environments, including Maze, predator-prey, Mu Jo Co, and Google Football. |
| Researcher Affiliation | Collaboration | Chengjie Wu1, Pingzhong Tang12, Jun Yang3, Yujing Hu4, Tangjie Lv4, Changjie Fan4, Chongjie Zhang5 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Turingsense 3Department of Automation, Tsinghua University 4Fuxi AI Lab, Net Ease 5Department of Computer Science & Engineering, Washington University in St. Louis |
| Pseudocode | Yes | Algorithm 1 Constrained Self-Play (CSP) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | Empirically, we evaluate the effectiveness of our algorithm in four environments: a didactic maze environment, predator-prey in MPE [39], a competitive two-agent Mu Jo Co environment [2, 10] requiring continuous control, and Google Football [21]. |
| Dataset Splits | No | The paper discusses training and testing performance but does not specify validation dataset splits (e.g., percentages or sample counts for a validation set). |
| Hardware Specification | Yes | Each seed is run on a GPU server with one NVIDIA P100 GPU, and Intel(R) Xeon(R) Gold 6145 CPU @ 2.00GHz CPU. |
| Software Dependencies | No | The paper mentions using MAPPO [48] as the base RL algorithm but does not specify versions for programming languages or libraries (e.g., Python 3.x, PyTorch x.x). |
| Experiment Setup | Yes | The hyper-parameters are listed in Table 6, 7, and 8. and Table 5: Hyper-parameters for maze. ppo_epoch 1 num_mini_batch 1 entropy_coef 0.3 use_gae True gamma 0.99999 gae_lambda 0.95 critic_lr 7e-4 lr 7e-4 weight_decay 0 adam_eps 1e-5 n_rollout_threads 20 ppo_episode_length 12 data_chunk_length 12 steps 1.8K max_grad_norm 0.5 bc_regularization_coef 10 bc_batch_size 8 network MLP |