reproducibilityindex.ai

MANSA: Learning Fast and Slow in Multi-Agent Systems

Authors: David Henry Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Longfei Yue, Xidong Feng, Stephen Marcus Mcaleer, Feifei Tong, Jun Wang, Yaodong Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically in Level-based Foraging (LBF) and Star Craft Multi-agent Challenge (SMAC) that MANSA achieves fast, superior and more reliable performance while making 40% fewer CL calls in SMAC and using CL at only 1% CL calls in LBF. We performed a series of experiments to test whether MANSA 1. Enables MARL to solve multi-agent problems while reducing the number of CL calls 2. Improves the performance of IL and reduces its failure modes 3. Learns to optimise its use of CL under a CL call budget.
Researcher Affiliation	Collaboration	1Huawei R&D 2Institute for AI, Peking University 3University of Manchester 4University College, London 5Independent Researcher. Correspondence to: <davidmguni@hotmail.com>, <j.wang@ucl.ac.uk>, <yaodong.yang@pku.edu.cn>.
Pseudocode	No	The paper describes the MANSA framework and its components textually, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific links to source code repositories or explicit statements about the public release of the code for the described methodology.
Open Datasets	Yes	We used the code accompanying the MARL benchmark study of Papoudakis et al. (2021) for the baselines. For these experiments, we tested MANSA in Level-based Foraging (LBF) (Papoudakis et al., 2021) and Star Craft Multi-agent Challenge (SMAC) (Samvelyan et al., 2019).
Dataset Splits	No	The paper mentions 'training' and 'testing' of models and evaluates 'end-of-training win rates', but it does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification	No	The paper does not provide any specific hardware details (such as GPU or CPU models, memory specifications, or computational resources like cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions using QMIX, IQL, and SAC as algorithms, and refers to the code from Papoudakis et al. (2021), but it does not specify version numbers for any software, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	Clip Gradient Norm 1 γE 0.99 λ 0.95 Learning rate 1x10 4 Number of minibatches 4 Number of optimisation epochs 4 Number of parallel actors 16 Optimisation algorithm Adam Rollout length 128 Sticky action probability 0.25 Use Generalized Advantage Estimation True Coefﬁcient of extrinsic reward [1, 5] Coefﬁcient of intrinsic reward [1, 2, 5, 10, 20, 50] Global discount factor 0.99 Probability of terminating option [0.5, 0.75, 0.8, 0.9, 0.95] L function output size [2, 4, 8, 16, 32, 64, 128, 256]