Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations
Authors: Peixi Peng, Junliang Xing, Lili Cao
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed approach on a real-time strategy combat game. Experimental results show that the approach outperforms many competing demonstration-based methods. |
| Researcher Affiliation | Academia | Peixi Peng1,2 , Junliang Xing1 and Lili Cao1 1 Institute of Automation, Chinese Academy of Sciences 2 Peking University |
| Pseudocode | Yes | Algorithm 1: The best response dynamics algorithm. Algorithm 2: The proposed learning algorithm. |
| Open Source Code | No | The paper does not provide any specific links, explicit statements, or references to supplementary material for open-source code for the methodology described. |
| Open Datasets | Yes | Our approach is tested using Spar Craft [Churchill et al., 2012], which is a simulator of the Star Craft local combat game and is widely adopted to test AL algorithms [Churchill and Buro, 2013; Lelis, 2017; Moraes and Lelis, 2018]. an additional experiment is conducted on the traffic junction task [Sukhbaatar and Fergus, 2016]. |
| Dataset Splits | No | The paper mentions a 'cross-validation setting' but does not provide specific details on the dataset splits (e.g., percentages or sample counts for training, validation, and test sets). |
| Hardware Specification | Yes | The models are trained on Ge Force GTX 1080 and tested on a PC with one 2.4 GHz CPU and 8G RAM. |
| Software Dependencies | No | The paper mentions 'SGD' as an optimizer but does not specify any software libraries, frameworks, or their version numbers used in the implementation. |
| Experiment Setup | Yes | The iterations of Alg. 1, EUpdate of Alg. 2 and λ are set to 7, 500 and 0.995, respectively. The mean win rates and terminal hit points reward 1 over 100 battles are used as evaluation metrics. All networks are optimized by SGD with learning rate 10 3. |