Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction
Authors: Zikang Zhou, HU Haibo, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite having merely 3M model parameters, Behavior GPT won first place in the 2024 Waymo Open Sim Agents Challenge with a realism score of 0.7473 and a min ADE score of 1.4147, demonstrating its exceptional performance in traffic agent simulation. |
| Researcher Affiliation | Collaboration | Zikang Zhou1 Haibo Hu1 Xinhong Chen1 Jianping Wang1 Nan Guan1 Kui Wu2 Yung-Hui Li3 Yu-Kai Huang4 Chun Jason Xue5 1City University of Hong Kong 2University of Victoria 3Hon Hai Research Institute 4Carnegie Mellon University 5Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | No | The paper describes the model architecture and training process in detail but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Our code will also be made public after the paper is published. |
| Open Datasets | Yes | Our experiments are conducted on the Waymo Open Motion Dataset (WOMD) [15]. The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios. Each scenario includes 91-step observations sampled at 10 Hz, totaling 9.1 seconds. ... The data used in this work is the Waymo Open Motion Dataset, which is publicly available. |
| Dataset Splits | Yes | The dataset comprises 486,995/44,097/44,920 training/validation/testing scenarios. |
| Hardware Specification | Yes | We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31]. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer [31] but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The optimal patch size we experimented with is 10, corresponding to 1 second. All hidden sizes are set to 128. Each attention layer has 8 attention heads with 16 dimensions per head. ... The prediction head produces 16 modes per agent and time step. We train the models for 30 epochs on 8 NVIDIA RTX 4090 GPUs with a batch size of 24, utilizing the Adam W optimizer [31]. The weight decay rate and dropout rate are both set to 0.1. The learning rate is initially set to 5 10 4 and decayed to 0 following a cosine annealing schedule [30]. |