BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Authors: Zikang Zhou, HU Haibo, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Despite having merely 3M model parameters, Behavior GPT won first place in the 2024 Waymo Open Sim Agents Challenge with a realism score of 0.7473 and a min ADE score of 1.4147, demonstrating its exceptional performance in traffic agent simulation.
Researcher Affiliation Collaboration Zikang Zhou1 Haibo Hu1 Xinhong Chen1 Jianping Wang1 Nan Guan1 Kui Wu2 Yung-Hui Li3 Yu-Kai Huang4 Chun Jason Xue5 1City University of Hong Kong 2University of Victoria 3Hon Hai Research Institute 4Carnegie Mellon University 5Mohamed bin Zayed University of Artificial Intelligence
Pseudocode No The paper describes the model architecture and training process in detail but does not include explicit pseudocode or algorithm blocks.
Open Source Code No Our code will also be made public after the paper is published.
Open Datasets Yes Our experiments are conducted on the Waymo Open Motion Dataset (WOMD) [15]. The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios. Each scenario includes 91-step observations sampled at 10 Hz, totaling 9.1 seconds. ... The data used in this work is the Waymo Open Motion Dataset, which is publicly available.
Dataset Splits Yes The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios.
Hardware Specification Yes We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31].
Software Dependencies No The paper mentions using the Adam W optimizer [31] but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The optimal patch size we experimented with is 10, corresponding to 1 second. All hidden sizes are set to 128. Each attention layer has 8 attention heads with 16 dimensions per head. ... The prediction head produces 16 modes per agent and time step. We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31]. The weight decay rate and dropout rate are both set to 0.1. The learning rate is initially set to 5 10 4 and decayed to 0 following a cosine annealing schedule [30].