N-agent Ad Hoc Teamwork
Authors: Caroline Wang, Muhammad Arrasy Rahman, Ishan Durugkar, Elad Liebman, Peter Stone
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on tasks from the multi-agent particle environment and Star Craft II shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates. |
| Researcher Affiliation | Collaboration | Caroline Wang Department of Computer Science The University of Texas at Austin caroline.l.wang@utexas.edu Arrasy Rahman Department of Computer Science The University of Texas at Austin arrasy@cs.utexas.edu Ishan Durugkar Sony AI ishan.durugkar@sony.com Elad Liebman Amazon liebelad@amazon.com Peter Stone Department of Computer Science The University of Texas at Austin and Sony AI pstone@cs.utexas.edu |
| Pseudocode | No | The paper describes the POAM algorithm and its components (Agent Modeling Network, Policy and Value Networks) in text and through a diagram (Figure 2), but it does not provide any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/carolinewang01/naht. |
| Open Datasets | Yes | Experiments are conducted on a predator-prey mpe-pp task implemented within the multi-agent particle environment [24], and the 5v6, 8v9, 10v11, 3s5z tasks from the Star Craft Multi-Agent Challenge (SMAC) benchmark [34]. |
| Dataset Splits | No | The paper mentions tuning hyperparameters and applying them to tasks but does not specify distinct training, validation, and test splits with percentages, counts, or dedicated validation set usage for model selection or early stopping. It states 'We tune the hyperparameters of the policy gradient methods on the 5v6 task, and apply those parameters directly to the remaining SMAC tasks.' |
| Hardware Specification | Yes | The servers used for our experiments ran Ubuntu 20.04 with the following configurations: Intel Xeon CPU E5-2630 v4; Nvidia Titan V GPU. Intel Xeon CPU E5-2698 v4; Nvidia Tesla V100-SXM2 GPU. Intel Xeon Gold 6342 CPU; Nvidia A40 GPU. Intel Xeon Gold 6342 CPU; Nvidia A100 Gpu. |
| Software Dependencies | No | The paper mentions using 'e Py MARL codebase' and base implementations of 'IQL, VDN, QMIX, IPPO, and MAPPO' algorithms, and the 'Adam optimizer', but it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | Optimizer and Neural Architecture. The Adam optimizer is applied for all networks involved. For policy gradient methods, the policy architecture is two fully connected layers, followed by an RNN (GRU) layer, followed by an output layer. Each layer has 64 neurons with Re LU activation units, and employs layer normalization. The critic architecture is the same as the policy architecture. [...] Table 1: Hyperparameters evaluated for the policy gradient algorithms. [...] Table 2: Additional hyperparameters evaluated for POAM; note that ED stands for encoder-decoder. Selected values are bolded. |