LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning
Authors: David Henry Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves, Oliver Slumbers, Feifei Tong, Yang Li, Jiangcheng Zhu, Yaodong Yang, Jun Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its superior performance in challenging tasks in Foraging and Star Craft II. We performed a series of experiments on the Level-based Foraging environment (Papoudakis et al., 2020) to test if LIGS: 1. Efficiently promotes joint exploration 2. Optimises convergence points by inducing coordination. 3. Handles sparse reward environments. In all tasks, we compared the performance of LIGS against MAPPO (Yu et al., 2021), QMIX (Rashid et al., 2018); intrinsic reward MARL algorithms LIIR (Du et al., 2019), LICA (Zhou et al., 2020a), and a leading MARL exploration algorithm MAVEN (Mahajan et al., 2019). We then compared LIGS against these baselines in Star Craft Micromanagement II (SMAC) (Samvelyan et al., 2019). |
| Researcher Affiliation | Collaboration | 1Huawei Technologies, 2Imperial College London, 3Shanghaitech University, 4Institute for AI, Peking University & BIGAI, 5University College London |
| Pseudocode | No | The paper states 'give the full code of the algorithm in Sec. 9 of the Appendix'. However, the provided text does not include Section 9 of the Appendix, nor does it contain any visibly structured pseudocode or algorithm blocks in the main body. |
| Open Source Code | No | The paper mentions 'give the full code of the algorithm in Sec. 9 of the Appendix', but it does not provide a direct link to a public repository (e.g., GitHub) nor does it explicitly state that the code is open-source or publicly released beyond being in an appendix not provided in the prompt. |
| Open Datasets | Yes | We performed a series of experiments on the Level-based Foraging environment (Papoudakis et al., 2020) to test if LIGS... We then compared LIGS against these baselines in Star Craft Micromanagement II (SMAC) (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper describes the datasets and environments used (Level-based Foraging, Star Craft Micromanagement II) but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or explicit cross-validation schemes). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., specific GPU or CPU models, memory, or cloud instance types) used for running the experiments. It only mentions 'We ran 3 seeds of each algorithm'. |
| Software Dependencies | No | In our implementation, we used proximal policy optimization (PPO) (Schulman et al., 2017) as the learning algorithm for both the Generator s intervention policy gc and Generator s policy g. For the N agents we used MAPPO (Yu et al., 2021). The paper mentions specific algorithms (PPO, MAPPO) but does not provide version numbers for these or any other software libraries, frameworks, or programming languages used. |
| Experiment Setup | No | The paper describes some methodological choices like using PPO and MAPPO for learning algorithms and constructing Fθ using a fixed neural network. However, it lacks specific details about the experimental setup such as hyperparameter values (e.g., learning rate, batch size, number of epochs), optimizer settings, or other system-level training configurations necessary for reproducibility. |