BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction
Authors: Zikang Zhou, HU Haibo, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite having merely 3M model parameters, Behavior GPT won first place in the 2024 Waymo Open Sim Agents Challenge with a realism score of 0.7473 and a min ADE score of 1.4147, demonstrating its exceptional performance in traffic agent simulation. |
| Researcher Affiliation | Collaboration | Zikang Zhou1 Haibo Hu1 Xinhong Chen1 Jianping Wang1 Nan Guan1 Kui Wu2 Yung-Hui Li3 Yu-Kai Huang4 Chun Jason Xue5 1City University of Hong Kong 2University of Victoria 3Hon Hai Research Institute 4Carnegie Mellon University 5Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | No | The paper describes the model architecture and training process in detail but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Our code will also be made public after the paper is published. |
| Open Datasets | Yes | Our experiments are conducted on the Waymo Open Motion Dataset (WOMD) [15]. The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios. Each scenario includes 91-step observations sampled at 10 Hz, totaling 9.1 seconds. ... The data used in this work is the Waymo Open Motion Dataset, which is publicly available. |
| Dataset Splits | Yes | The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios. |
| Hardware Specification | Yes | We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31]. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer [31] but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The optimal patch size we experimented with is 10, corresponding to 1 second. All hidden sizes are set to 128. Each attention layer has 8 attention heads with 16 dimensions per head. ... The prediction head produces 16 modes per agent and time step. We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31]. The weight decay rate and dropout rate are both set to 0.1. The learning rate is initially set to 5 10 4 and decayed to 0 following a cosine annealing schedule [30]. |