reproducibilityindex.ai

SimGen: Simulator-conditioned Driving Scene Generation

Authors: Yunsong Zhou, Michael Simon, Zhenghao (Mark) Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further demonstrate the improvements brought by Sim Gen for synthetic data augmentation on the BEV detection and segmentation task and showcase its capability in safety-critical data generation.
Researcher Affiliation	Academia	1 University of California, Los Angeles 2 Shanghai Jiao Tong University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes and data are available in https://github.com/metadriverse/ Sim Gen, and we show full implementation details in Appendix C.
Open Datasets	Yes	A driving video dataset DIVA is collected to enhance the generative diversity of Sim Gen, which contains over 147.5 hours of real-world driving videos from 73 locations worldwide and simulated driving data from the Meta Drive simulator.
Dataset Splits	Yes	The nu Scenes dataset [6] is a public driving dataset that includes 1000 scenes from Boston and Singapore for diverse driving tasks [87, 42, 41]. Each scene comprises a 20-second video, approximately 40 frames. It provides 700 training scenes, 150 validation scenes, and 150 test scenes.
Hardware Specification	Yes	The default GPUs in most of our experiments are NVIDIA Tesla A6000 devices unless otherwise specified.
Software Dependencies	Yes	Concretely, we utilize Stable Diffusion 2.1 (SD-2.1) [60], a large-scale latent diffusion model for text-to-image generation. It is implemented as a denoising UNet, denoted by ϵθ, with multiple stacked convolutional and attention blocks, which learns to synthesize images by denoising latent noise.
Experiment Setup	Yes	It is trained on 4.5M text-depth-segmentation pairs of DIVA-Real and nu Scenes. We train the model for 30K iterations on 8 GPUs with a batch size of 96 with Adam W [43]. We linearly warm up the learning rate for 103 steps in the beginning, then keep it constant at 1 × 10−5.