Learning to Cooperate with Humans using Generative Agents
Authors: Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du, Natasha Jaques
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method Generative Agent Modeling for Multi-agent Adaptation (GAMMA) on Overcooked, a challenging cooperative cooking game that has become a standard benchmark for zero-shot coordination. We conduct an evaluation with real human teammates, and the results show that GAMMA consistently improves performance, whether the generative model is trained on simulated populations or human datasets. |
| Researcher Affiliation | Academia | Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du*, Natasha Jaques* University of Washington {yancheng, daphc, abhgupta, ssdu, nj}@cs.washington.edu |
| Pseudocode | No | The paper describes the methodology textually and provides an overview diagram (Figure 2), but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1See our website for human-AI study videos and an interactive demo. The training code is also available. Our demo website is https://sites.google.com/view/human-ai-gamma-2024/ and contains the code and more experiment results. |
| Open Datasets | Yes | We evaluate GAMMA using the Overcooked environment [1] as a popular benchmark for prior work on human-AI cooperation [1, 24, 29, 36, 37]. For the human dataset in the original Overcooked paper [1], their open-sourced dataset contains 16 joint human-human trajectories for Cramped Room environment, 17 for Asymmetric Advantages, 16 for Coordination Ring, 12 for Forced Coordination, and 15 for Counter Circuit.. with length of T 1200. |
| Dataset Splits | Yes | To train a VAE on it, the dataset is split into a training dataset with 70% data and a validation dataset with the rest of 30% data. |
| Hardware Specification | Yes | We conducted our main experiments on clusters of AMD EPYC 64-Core Processor and NVIDIA A40/L40. |
| Software Dependencies | No | The paper mentions using PPO [25] and MAPPO [38] and provides hyperparameters in tables, but it does not specify software library names with their exact version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We provide information about the implementation details B and hyperparameters used in our experiments D to help reproduce our results. Table 1: Hyperparameters for policy models and Table 2: Hyperparameters for VAE models (these tables list specific values for learning rate, batch size, epoch, etc.). Also, Reward shaping for dish and soup pick-up is used for the first 100M steps to encourage exploration. |