Context-Aware Sparse Deep Coordination Graphs

Authors: Tonghan Wang, Liang Zeng, Weijun Dong, Qianlan Yang, Yang Yu, Chongjie Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To empirically evaluate our method, we present the Multi-Agent COordination (MACO) benchmark by collecting classic coordination problems in the literature, increasing their difficulty, and classifying them into different types. We carry out a case study and experiments on the MACO and Star Craft II micromanagement benchmark to demonstrate the dynamics of sparse graph learning, the influence of graph sparseness, and the learning performance of our method.
Researcher Affiliation Academia Tonghan Wang1, , Liang Zeng1,*, Weijun Dong1, Qianlan Yang1, Yang Yu2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University 2National Key Laboratory of Novel Software Technology, Nanjing University
Pseudocode No The paper describes algorithms using mathematical equations and textual explanations, but no structured pseudocode or algorithm blocks are provided.
Open Source Code Yes The MACO benchmark and codes are publicly available at https://github.com/TonghanWang/CASEC-MACO-benchmark.
Open Datasets Yes To evaluate our sparse graph learning algorithm, we collect classic coordination problems from the cooperative multi-agent learning literature, improve their difficulty, and classify them into different types. Then, 6 representative problems are selected and presented as a new benchmark called Multi-Agent COordination (MACO) challenge (Table 1). We then test CASEC on the Star Craft II micromanagement benchmark (Samvelyan et al., 2019) to demonstrate its scalability and effectiveness.
Dataset Splits No The paper discusses training and testing, but does not explicitly provide details on how datasets were split into training, validation, and test sets. No percentages, counts, or specific split methodologies are mentioned.
Hardware Specification Yes All the experiments are carried out on NVIDIA Tesla P100 GPU.
Software Dependencies No The paper mentions that 'both our method and all the considered baselines are implemented based on the open-sourced codebase PyMARL2'. However, it does not provide specific version numbers for PyMARL or other key software components such as Python, PyTorch, or CUDA.
Experiment Setup Yes For all experiments, the optimization is conducted using RMSprop with a learning rate of 5 10 4, α of 0.99, RMSProp epsilon of 0.00001, and with no momentum or weight decay. For exploration, we use ϵ-greedy with ϵ annealed linearly from 1.0 to 0.05 over 50K time steps and kept constant for the rest of the training. Batches of 32 episodes are sampled from the replay buffer. The default iteration number of the Max-Sum algorithm is set to 5. The communication threshold depends on the number of agents and the task, and we set it to 0.3 on the map 5m_vs_6m and 0.35 on the map MMM2. We test the performance with different values (1e-3, 1e-4, and 1e-5) of the scaling weight of the sparseness loss Lqvar sparse on Pursuit, and set it to 1e-4 for both the MACO and SMAC benchmark. The whole framework is trained end-to-end on fully unrolled episodes.