reproducibilityindex.ai

Online Ad Hoc Teamwork under Partial Observability

Authors: Pengjie Gu, Mengchen Zhao, Jianye Hao, Bo An

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results show that ODITS signiﬁcantly outperforms various baselines in widely used ad hoc teamwork tasks. In our experimental evaluation, by interacting with a small set of given teammates, the trained agents could robustly collaborate with diverse new teammates. Compared with various type-based baselines, ODITS reveals superior ad hoc teamwork performance. Moreover, our ablations show both the necessity of learning latent variables of teamwork situations and inferring the proxy representations of learned variables.
Researcher Affiliation	Collaboration	School of Computer Science and Engineering, Nanyang Technological University, Singapore1 Noah s Ark Lab, Huawei2 College of Intelligence and Computing, Tianjin University3
Pseudocode	Yes	Algorithm 1 ODITS Training Algorithm 2 ODITS Testing
Open Source Code	No	The paper does not provide an explicit statement or link to its own open-source code for the described methodology. It mentions using 'the open-source implementation mentioned in (Raileanu et al., 2020)' for visualizing policy representations, which refers to a third-party tool.
Open Datasets	No	The paper describes generating its own 'teammate set' by utilizing 5 different MARL algorithms and then manually selecting and partitioning policies into training and testing sets. It does not provide access information (link, DOI, specific citation for download) for a publicly available dataset in the conventional sense.
Dataset Splits	No	The paper states, 'Finally, we randomly sampled 8 policies as the training set and the other 7 policies as the testing set.' It does not mention a separate validation set or specific percentages for the splits, nor does it describe cross-validation.
Hardware Specification	No	The paper has a section 'A.2 ARCHITECTURE, HYPERPARAMETERS, AND INFRASTRUCTURE' but it only describes hyperparameter settings and training procedures, not specific hardware components like CPU/GPU models or memory details.
Software Dependencies	No	The paper mentions using 'RMSprop', 'DQN algorithm', 'VDN', 'QMIX', and 'Py MARL' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	It is conducted using RMSprop with a learning rate of 5 10 4, α of 0.99, and with no momentum or weight decay. For the lambda value, we search over{1e 5, 1e 4, 5e 4, 1e 3, 5e 3, 1e 2}. We ﬁnally adopt λMI = 1e 3,λMI = 1e 3, and λMI = 5e 4 for Modiﬁed Coin Game, Predator Prey, and Save the City, respectively, since they induce the best performance compared with other values. For the dimension of the latent variables zi t and ci t, we search over {1, 2, 3, 5, 10} and ﬁnally adopt \|z\| = 10 in Save the city and \|z\| = 1 in the other environments. In addition, we set \|c\| = \|z\|. For exploration, we use ϵ-greedy with ϵ from 1.0 to 0.05 over 50, 000 time steps and kept constant for the rest of the training. Batches of 128 episodes are sampled from the replay buffer, and all components in the framework are trained together in an end-to-end fashion.