Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations

Authors: Peixi Peng, Junliang Xing, Lili Cao

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach on a real-time strategy combat game. Experimental results show that the approach outperforms many competing demonstration-based methods.
Researcher Affiliation Academia Peixi Peng1,2 , Junliang Xing1 and Lili Cao1 1 Institute of Automation, Chinese Academy of Sciences 2 Peking University
Pseudocode Yes Algorithm 1: The best response dynamics algorithm. Algorithm 2: The proposed learning algorithm.
Open Source Code No The paper does not provide any specific links, explicit statements, or references to supplementary material for open-source code for the methodology described.
Open Datasets Yes Our approach is tested using Spar Craft [Churchill et al., 2012], which is a simulator of the Star Craft local combat game and is widely adopted to test AL algorithms [Churchill and Buro, 2013; Lelis, 2017; Moraes and Lelis, 2018]. an additional experiment is conducted on the traffic junction task [Sukhbaatar and Fergus, 2016].
Dataset Splits No The paper mentions a 'cross-validation setting' but does not provide specific details on the dataset splits (e.g., percentages or sample counts for training, validation, and test sets).
Hardware Specification Yes The models are trained on Ge Force GTX 1080 and tested on a PC with one 2.4 GHz CPU and 8G RAM.
Software Dependencies No The paper mentions 'SGD' as an optimizer but does not specify any software libraries, frameworks, or their version numbers used in the implementation.
Experiment Setup Yes The iterations of Alg. 1, EUpdate of Alg. 2 and λ are set to 7, 500 and 0.995, respectively. The mean win rates and terminal hit points reward 1 over 100 battles are used as evaluation metrics. All networks are optimized by SGD with learning rate 10 3.