Multi-Agent Interactions Modeling with Correlated Policies
Authors: Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Various experiments demonstrate that Co DAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiaotong University, 2 Huawei Noah s Ark Lab {minghuanliu, mingak, wnzhang, yyu}@sjtu.edu.cn, {zhuangyuzheng, w.j, liuwulong}@huawei.com |
| Pseudocode | Yes | We name our algorithm as Decentralized Adversarial Imitation Learning with Correlated policies (Correlated DAIL, a.k.a. Co DAIL) and present the training procedure in Appendix Algo. 1, which can be easily scaled to a distributed algorithm. [Appendix A.1 CoDAIL Algorithm, Algorithm 1 Co DAIL Algorithm] |
| Open Source Code | Yes | Our code is available at https://github.com/apexrl/Co DAIL. |
| Open Datasets | Yes | We test our method on the Particle World Environments (Lowe et al., 2017), which is a popular benchmark for evaluating multi-agent algorithms. |
| Dataset Splits | No | The paper mentions using "200 episodes of demonstrations" for training but does not provide specific percentages or counts for training, validation, or test dataset splits. It describes how the demonstrator policies were trained and how their agents were trained, but not how the demonstration *data* was split for the imitation learning process. |
| Hardware Specification | No | The paper describes the model architecture (e.g., "two layer MLPs with 128 cells") and training parameters, but it does not specify any hardware details such as GPU models, CPU types, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions using "K-FAC optimizer (Martens & Grosse, 2015)" and refers to modifications of "ACKTR (Wu et al., 2017)", but it does not provide specific version numbers for these or other software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | During our experiments, we use two layer MLPs with 128 cells in each layer, for policy networks, value networks, discriminator networks and opponents model networks on all scenarios. The batch size is set to 1000. The policy is trained using K-FAC optimizer (Martens & Grosse, 2015) with learning rate of 0.1 and with a small λ of 0.05. All other parameters for KFAC optimizer are the same in (Wu et al., 2017). We train each algorithm for 55000 epochs with 5 random seeds to gain its average performance on all environments. [Appendix B, D] |