reproducibilityindex.ai

Conservative Offline Policy Adaptation in Multi-Agent Games

Authors: Chengjie Wu, Pingzhong Tang, Jun Yang, Yujing Hu, Tangjie Lv, Changjie Fan, Chongjie Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that CSP outperforms non-conservative baselines in various environments, including Maze, predator-prey, Mu Jo Co, and Google Football.
Researcher Affiliation	Collaboration	Chengjie Wu1, Pingzhong Tang12, Jun Yang3, Yujing Hu4, Tangjie Lv4, Changjie Fan4, Chongjie Zhang5 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Turingsense 3Department of Automation, Tsinghua University 4Fuxi AI Lab, Net Ease 5Department of Computer Science & Engineering, Washington University in St. Louis
Pseudocode	Yes	Algorithm 1 Constrained Self-Play (CSP)
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	Empirically, we evaluate the effectiveness of our algorithm in four environments: a didactic maze environment, predator-prey in MPE [39], a competitive two-agent Mu Jo Co environment [2, 10] requiring continuous control, and Google Football [21].
Dataset Splits	No	The paper discusses training and testing performance but does not specify validation dataset splits (e.g., percentages or sample counts for a validation set).
Hardware Specification	Yes	Each seed is run on a GPU server with one NVIDIA P100 GPU, and Intel(R) Xeon(R) Gold 6145 CPU @ 2.00GHz CPU.
Software Dependencies	No	The paper mentions using MAPPO [48] as the base RL algorithm but does not specify versions for programming languages or libraries (e.g., Python 3.x, PyTorch x.x).
Experiment Setup	Yes	The hyper-parameters are listed in Table 6, 7, and 8. and Table 5: Hyper-parameters for maze. ppo_epoch 1 num_mini_batch 1 entropy_coef 0.3 use_gae True gamma 0.99999 gae_lambda 0.95 critic_lr 7e-4 lr 7e-4 weight_decay 0 adam_eps 1e-5 n_rollout_threads 20 ppo_episode_length 12 data_chunk_length 12 steps 1.8K max_grad_norm 0.5 bc_regularization_coef 10 bc_batch_size 8 network MLP