Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning

Authors: Muhammad A Rahman, Niklas Hopner, Filippos Christianos, Stefano V Albrecht

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments evaluate GPL and various baselines in three multi-agent environments (Level-based foraging (Albrecht & Ramamoorthy, 2013), Wolfpack (Leibo et al., 2017), Fort Attack (Deka & Sycara, 2020)) for which we use different processes to specify when agents enter or leave the environment and their type assignments. We compare GPL against ablations of GPL that integrate agent models using input concatenation, a common approach used by prior works (Grover et al., 2018; Tacchetti et al., 2019); as well as two MARL approaches (MADDPG (Lowe et al., 2017), DGN (Jiang et al., 2019)). Our results show that both tested GPL variants achieve significantly higher returns than all other baselines in most learning tasks, and that GPL generalizes more effectively to previously unseen team sizes/compositions.
Researcher Affiliation Academia 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2University of Amsterdam, Amsterdam, Netherlands.
Pseudocode Yes A general overview of GPL’s architecture is provided in Figure 1 while the complete learning pseudocode is given in Appendix D.
Open Source Code Yes Implementation code can be found at https://github. com/uoe-agents/GPL
Open Datasets No The paper mentions conducting experiments in three multi-agent environments (Level-based foraging, Wolfpack, Fort Attack) and constructing 'a diverse set of teammate types for each environment'. It describes how agents enter and leave during episodes, but it does not provide concrete access information (e.g., specific links, DOIs, or citations to public datasets) for the data used for training. While the environments might be known, the specific data generated/used for training in this particular setup is not made publicly accessible with a link or citation.
Dataset Splits No The paper states: 'For testing, we increase the upper limit on team size to expose the learner against team configurations it has never encountered before. In all three environments, we specifically limit the team size to three agents during training time and increase this limit to five agents for testing.' This describes how training and testing scenarios differ in terms of team size, but it does not provide specific percentages or counts for training, validation, and test splits of a fixed dataset.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only generally refers to training and evaluation processes.
Software Dependencies No The paper mentions using 'GNN libraries (Wang et al., 2019)' but does not specify any software names with version numbers for reproducibility (e.g., Python version, specific library versions like PyTorch, TensorFlow, etc.).
Experiment Setup Yes We conduct experiments in three fully observable multi-agent environments with different game complexity: Level-based foraging (LBF), Wolfpack, Fort Attack. ... We compare GPL against ablations of GPL that integrate agent models using input concatenation... as well as two MARL approaches (MADDPG (Lowe et al., 2017), DGN (Jiang et al., 2019)). ... After every 160000 global training timesteps, GPL and baselines are stored and evaluated... GPL-SPI’s policy uses the Boltzmann distribution, p SPI(ai t|st) exp Q(st, ai) with τ being the temperature parameter.