LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

Authors: David Henry Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves, Oliver Slumbers, Feifei Tong, Yang Li, Jiangcheng Zhu, Yaodong Yang, Jun Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its superior performance in challenging tasks in Foraging and Star Craft II. We performed a series of experiments on the Level-based Foraging environment (Papoudakis et al., 2020) to test if LIGS: 1. Efficiently promotes joint exploration 2. Optimises convergence points by inducing coordination. 3. Handles sparse reward environments. In all tasks, we compared the performance of LIGS against MAPPO (Yu et al., 2021), QMIX (Rashid et al., 2018); intrinsic reward MARL algorithms LIIR (Du et al., 2019), LICA (Zhou et al., 2020a), and a leading MARL exploration algorithm MAVEN (Mahajan et al., 2019). We then compared LIGS against these baselines in Star Craft Micromanagement II (SMAC) (Samvelyan et al., 2019).
Researcher Affiliation Collaboration 1Huawei Technologies, 2Imperial College London, 3Shanghaitech University, 4Institute for AI, Peking University & BIGAI, 5University College London
Pseudocode No The paper states 'give the full code of the algorithm in Sec. 9 of the Appendix'. However, the provided text does not include Section 9 of the Appendix, nor does it contain any visibly structured pseudocode or algorithm blocks in the main body.
Open Source Code No The paper mentions 'give the full code of the algorithm in Sec. 9 of the Appendix', but it does not provide a direct link to a public repository (e.g., GitHub) nor does it explicitly state that the code is open-source or publicly released beyond being in an appendix not provided in the prompt.
Open Datasets Yes We performed a series of experiments on the Level-based Foraging environment (Papoudakis et al., 2020) to test if LIGS... We then compared LIGS against these baselines in Star Craft Micromanagement II (SMAC) (Samvelyan et al., 2019).
Dataset Splits No The paper describes the datasets and environments used (Level-based Foraging, Star Craft Micromanagement II) but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or explicit cross-validation schemes).
Hardware Specification No The paper does not specify any hardware details (e.g., specific GPU or CPU models, memory, or cloud instance types) used for running the experiments. It only mentions 'We ran 3 seeds of each algorithm'.
Software Dependencies No In our implementation, we used proximal policy optimization (PPO) (Schulman et al., 2017) as the learning algorithm for both the Generator s intervention policy gc and Generator s policy g. For the N agents we used MAPPO (Yu et al., 2021). The paper mentions specific algorithms (PPO, MAPPO) but does not provide version numbers for these or any other software libraries, frameworks, or programming languages used.
Experiment Setup No The paper describes some methodological choices like using PPO and MAPPO for learning algorithms and constructing Fθ using a fixed neural network. However, it lacks specific details about the experimental setup such as hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or other system-level training configurations necessary for reproducibility.