SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout

Authors: Max Jiang, Yijing Bai, Andre Cornman, Christopher Davis, XIUKUN HUANG, Hong Jeon, Sakshum Kulshrestha, John Lambert, Shuangyu Li, Xuanyu Zhou, Carlos Fuertes, Chang Yuan, Mingxing Tan, Yin Zhou, Dragomir Anguelov

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models. and 4 Experimental Results
Researcher Affiliation Industry All the authors are employees of Waymo LLC.
Pseudocode Yes We illustrate the three algorithms in Algorithm 1-3 using the same model trained with a noise mixture t {U(0, 1); ˆt} (Eqn. 2). We also illustrate Algorithm 3 in Fig. 4.Algorithm 1 One-Shot (Open-Loop)Algorithm 2 Full AR (Closed-Loop)Algorithm 3 Amortized AR (Closed-Loop)
Open Source Code No We do not plan to release code in the near future.
Open Datasets Yes Dataset We use the Waymo Open Motion Dataset (WOMD)[7] for both our scene generation and agent simulation experiments. and our dataset is based on the Waymo Open Motion Dataset, which is already publicly accessible.
Dataset Splits Yes Across the dataset splits, there exists 486,995 scenarios in train, 44,097 in validation, and 44,920 in test.
Hardware Specification No The paper mentions 'computational resources' and provides 'compute GFLOPs' in Figure 6, but does not specify details like exact GPU/CPU models or memory used for experiments.
Software Dependencies No The paper mentions software like 'Adafactor optimizer', 'Adam', and 'DPM++', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Training details Train batch size of 1024, and train for 1.2M steps. We select the most competitive model based on validation set performance, for which we perform a final evaluation using the test set. We use an initial learning rate of 3 10 4. We use 16 diffusion sampling steps. When training, we mix the behavior prediction (BP) task with the scene generation task, with probability 0.5. The randomized control mask is applied to both tasks.