MANSA: Learning Fast and Slow in Multi-Agent Systems

Authors: David Henry Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Longfei Yue, Xidong Feng, Stephen Marcus Mcaleer, Feifei Tong, Jun Wang, Yaodong Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically in Level-based Foraging (LBF) and Star Craft Multi-agent Challenge (SMAC) that MANSA achieves fast, superior and more reliable performance while making 40% fewer CL calls in SMAC and using CL at only 1% CL calls in LBF. We performed a series of experiments to test whether MANSA 1. Enables MARL to solve multi-agent problems while reducing the number of CL calls 2. Improves the performance of IL and reduces its failure modes 3. Learns to optimise its use of CL under a CL call budget.
Researcher Affiliation Collaboration 1Huawei R&D 2Institute for AI, Peking University 3University of Manchester 4University College, London 5Independent Researcher. Correspondence to: <davidmguni@hotmail.com>, <j.wang@ucl.ac.uk>, <yaodong.yang@pku.edu.cn>.
Pseudocode No The paper describes the MANSA framework and its components textually, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific links to source code repositories or explicit statements about the public release of the code for the described methodology.
Open Datasets Yes We used the code accompanying the MARL benchmark study of Papoudakis et al. (2021) for the baselines. For these experiments, we tested MANSA in Level-based Foraging (LBF) (Papoudakis et al., 2021) and Star Craft Multi-agent Challenge (SMAC) (Samvelyan et al., 2019).
Dataset Splits No The paper mentions 'training' and 'testing' of models and evaluates 'end-of-training win rates', but it does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification No The paper does not provide any specific hardware details (such as GPU or CPU models, memory specifications, or computational resources like cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using QMIX, IQL, and SAC as algorithms, and refers to the code from Papoudakis et al. (2021), but it does not specify version numbers for any software, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes Clip Gradient Norm 1 γE 0.99 λ 0.95 Learning rate 1x10 4 Number of minibatches 4 Number of optimisation epochs 4 Number of parallel actors 16 Optimisation algorithm Adam Rollout length 128 Sticky action probability 0.25 Use Generalized Advantage Estimation True Coefficient of extrinsic reward [1, 5] Coefficient of intrinsic reward [1, 2, 5, 10, 20, 50] Global discount factor 0.99 Probability of terminating option [0.5, 0.75, 0.8, 0.9, 0.95] L function output size [2, 4, 8, 16, 32, 64, 128, 256]