reproducibilityindex.ai

Multi-Agent Interactions Modeling with Correlated Policies

Authors: Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Various experiments demonstrate that Co DAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods.
Researcher Affiliation	Collaboration	1 Shanghai Jiaotong University, 2 Huawei Noah s Ark Lab {minghuanliu, mingak, wnzhang, yyu}@sjtu.edu.cn, {zhuangyuzheng, w.j, liuwulong}@huawei.com
Pseudocode	Yes	We name our algorithm as Decentralized Adversarial Imitation Learning with Correlated policies (Correlated DAIL, a.k.a. Co DAIL) and present the training procedure in Appendix Algo. 1, which can be easily scaled to a distributed algorithm. [Appendix A.1 CoDAIL Algorithm, Algorithm 1 Co DAIL Algorithm]
Open Source Code	Yes	Our code is available at https://github.com/apexrl/Co DAIL.
Open Datasets	Yes	We test our method on the Particle World Environments (Lowe et al., 2017), which is a popular benchmark for evaluating multi-agent algorithms.
Dataset Splits	No	The paper mentions using "200 episodes of demonstrations" for training but does not provide specific percentages or counts for training, validation, or test dataset splits. It describes how the demonstrator policies were trained and how their agents were trained, but not how the demonstration data was split for the imitation learning process.
Hardware Specification	No	The paper describes the model architecture (e.g., "two layer MLPs with 128 cells") and training parameters, but it does not specify any hardware details such as GPU models, CPU types, or cloud computing resources used for running the experiments.
Software Dependencies	No	The paper mentions using "K-FAC optimizer (Martens & Grosse, 2015)" and refers to modifications of "ACKTR (Wu et al., 2017)", but it does not provide specific version numbers for these or other software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup	Yes	During our experiments, we use two layer MLPs with 128 cells in each layer, for policy networks, value networks, discriminator networks and opponents model networks on all scenarios. The batch size is set to 1000. The policy is trained using K-FAC optimizer (Martens & Grosse, 2015) with learning rate of 0.1 and with a small λ of 0.05. All other parameters for KFAC optimizer are the same in (Wu et al., 2017). We train each algorithm for 55000 epochs with 5 random seeds to gain its average performance on all environments. [Appendix B, D]