reproducibilityindex.ai

Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning

Authors: Muhammad A Rahman, Niklas Hopner, Filippos Christianos, Stefano V Albrecht

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments evaluate GPL and various baselines in three multi-agent environments (Level-based foraging (Albrecht & Ramamoorthy, 2013), Wolfpack (Leibo et al., 2017), Fort Attack (Deka & Sycara, 2020)) for which we use different processes to specify when agents enter or leave the environment and their type assignments. We compare GPL against ablations of GPL that integrate agent models using input concatenation, a common approach used by prior works (Grover et al., 2018; Tacchetti et al., 2019); as well as two MARL approaches (MADDPG (Lowe et al., 2017), DGN (Jiang et al., 2019)). Our results show that both tested GPL variants achieve signiﬁcantly higher returns than all other baselines in most learning tasks, and that GPL generalizes more effectively to previously unseen team sizes/compositions.
Researcher Affiliation	Academia	1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom 2University of Amsterdam, Amsterdam, Netherlands.
Pseudocode	Yes	A general overview of GPL’s architecture is provided in Figure 1 while the complete learning pseudocode is given in Appendix D.
Open Source Code	Yes	Implementation code can be found at https://github. com/uoe-agents/GPL
Open Datasets	No	The paper mentions conducting experiments in three multi-agent environments (Level-based foraging, Wolfpack, Fort Attack) and constructing 'a diverse set of teammate types for each environment'. It describes how agents enter and leave during episodes, but it does not provide concrete access information (e.g., specific links, DOIs, or citations to public datasets) for the data used for training. While the environments might be known, the specific data generated/used for training in this particular setup is not made publicly accessible with a link or citation.
Dataset Splits	No	The paper states: 'For testing, we increase the upper limit on team size to expose the learner against team conﬁgurations it has never encountered before. In all three environments, we speciﬁcally limit the team size to three agents during training time and increase this limit to ﬁve agents for testing.' This describes how training and testing scenarios differ in terms of team size, but it does not provide specific percentages or counts for training, validation, and test splits of a fixed dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only generally refers to training and evaluation processes.
Software Dependencies	No	The paper mentions using 'GNN libraries (Wang et al., 2019)' but does not specify any software names with version numbers for reproducibility (e.g., Python version, specific library versions like PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	We conduct experiments in three fully observable multi-agent environments with different game complexity: Level-based foraging (LBF), Wolfpack, Fort Attack. ... We compare GPL against ablations of GPL that integrate agent models using input concatenation... as well as two MARL approaches (MADDPG (Lowe et al., 2017), DGN (Jiang et al., 2019)). ... After every 160000 global training timesteps, GPL and baselines are stored and evaluated... GPL-SPI’s policy uses the Boltzmann distribution, p SPI(ai t\|st) exp Q(st, ai) with τ being the temperature parameter.