Learning to Cooperate with Humans using Generative Agents

Authors: Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du, Natasha Jaques

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method Generative Agent Modeling for Multi-agent Adaptation (GAMMA) on Overcooked, a challenging cooperative cooking game that has become a standard benchmark for zero-shot coordination. We conduct an evaluation with real human teammates, and the results show that GAMMA consistently improves performance, whether the generative model is trained on simulated populations or human datasets.
Researcher Affiliation Academia Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du*, Natasha Jaques* University of Washington {yancheng, daphc, abhgupta, ssdu, nj}@cs.washington.edu
Pseudocode No The paper describes the methodology textually and provides an overview diagram (Figure 2), but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1See our website for human-AI study videos and an interactive demo. The training code is also available. Our demo website is https://sites.google.com/view/human-ai-gamma-2024/ and contains the code and more experiment results.
Open Datasets Yes We evaluate GAMMA using the Overcooked environment [1] as a popular benchmark for prior work on human-AI cooperation [1, 24, 29, 36, 37]. For the human dataset in the original Overcooked paper [1], their open-sourced dataset contains 16 joint human-human trajectories for Cramped Room environment, 17 for Asymmetric Advantages, 16 for Coordination Ring, 12 for Forced Coordination, and 15 for Counter Circuit.. with length of T 1200.
Dataset Splits Yes To train a VAE on it, the dataset is split into a training dataset with 70% data and a validation dataset with the rest of 30% data.
Hardware Specification Yes We conducted our main experiments on clusters of AMD EPYC 64-Core Processor and NVIDIA A40/L40.
Software Dependencies No The paper mentions using PPO [25] and MAPPO [38] and provides hyperparameters in tables, but it does not specify software library names with their exact version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We provide information about the implementation details B and hyperparameters used in our experiments D to help reproduce our results. Table 1: Hyperparameters for policy models and Table 2: Hyperparameters for VAE models (these tables list specific values for learning rate, batch size, epoch, etc.). Also, Reward shaping for dish and soup pick-up is used for the first 100M steps to encourage exploration.