reproducibilityindex.ai

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Authors: Zikang Zhou, HU Haibo, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Despite having merely 3M model parameters, Behavior GPT won first place in the 2024 Waymo Open Sim Agents Challenge with a realism score of 0.7473 and a min ADE score of 1.4147, demonstrating its exceptional performance in traffic agent simulation.
Researcher Affiliation	Collaboration	Zikang Zhou1 Haibo Hu1 Xinhong Chen1 Jianping Wang1 Nan Guan1 Kui Wu2 Yung-Hui Li3 Yu-Kai Huang4 Chun Jason Xue5 1City University of Hong Kong 2University of Victoria 3Hon Hai Research Institute 4Carnegie Mellon University 5Mohamed bin Zayed University of Artificial Intelligence
Pseudocode	No	The paper describes the model architecture and training process in detail but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	Our code will also be made public after the paper is published.
Open Datasets	Yes	Our experiments are conducted on the Waymo Open Motion Dataset (WOMD) [15]. The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios. Each scenario includes 91-step observations sampled at 10 Hz, totaling 9.1 seconds. ... The data used in this work is the Waymo Open Motion Dataset, which is publicly available.
Dataset Splits	Yes	The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios.
Hardware Specification	Yes	We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31].
Software Dependencies	No	The paper mentions using the Adam W optimizer [31] but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The optimal patch size we experimented with is 10, corresponding to 1 second. All hidden sizes are set to 128. Each attention layer has 8 attention heads with 16 dimensions per head. ... The prediction head produces 16 modes per agent and time step. We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31]. The weight decay rate and dropout rate are both set to 0.1. The learning rate is initially set to 5 10 4 and decayed to 0 following a cosine annealing schedule [30].