Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Authors: Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiment results show that Drive Dreamer-2 is capable of producing diverse user-customized videos, including uncommon scenarios where vehicles abruptly cut in (depicted in Fig. 1). Besides, Drive Dreamer-2 can generate high-quality driving videos with an FID of 11.2 and FVD of 55.7, relatively improving previous best-performing methods by 30% and 50%. Furthermore, experiments verify that Drive Dreamer-2-generated driving videos can enhance the training of autonomous driving perception methods, improving detection by 4% and tracking by 8%. |
| Researcher Affiliation | Collaboration | Guosheng Zhao1,2,3*, Xiaofeng Wang1,2,3*, Zheng Zhu4* , Xinze Chen4, Guan Huang4, Xiaoyi Bao1,2,3, Xingang Wang2,3 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2Institute of Automation, Chinese Academy of Sciences 3Luoyang Institute for Robot and Intelligent Equipment 4Giga AI |
| Pseudocode | Yes | Function Library agent #Generate a trajectory of cutting in. def cut_in(obj_trajs, obj_vels, safe _dis, is_ego) #Generate a forward trajectory. def forward(obj_trajs, obj_vels, safe_dis, is_ego) ... #Set a random seed to generate trajectories. def set_random_seed(seed): #Save the trajectoies of agents. def save_trajectories(ego_trajs, obj_trajs) ... #Generate a trajectory of crossing road. def pedestrian_crossing() ... Python Script #import specific libraries import libs #set a random seed utils.set_random_seed(seed = 3577) #generate a trajectory of cutting in obj_trajs, obj_vels = agent.cut_in(is_ego = False) #generate the ego car trajectory ego_trajs = agent.forward(obj_trajs = obj_trajs, obj_vels=obj_vels, safe_dis=8, is_ego=True) #save the generated trajectories utils.save_trajectories(ego_trajs=ego_tra js, obj_trajs=obj_trajs) |
| Open Source Code | Yes | Project Page https://drivedreamer2.github.io |
| Open Datasets | Yes | The training dataset is derived from the nu Scenes dataset (Caesar et al. 2019), comprising 700 training videos and 150 validation videos. |
| Dataset Splits | Yes | The training dataset is derived from the nu Scenes dataset (Caesar et al. 2019), comprising 700 training videos and 150 validation videos. |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA A800 (80GB) GPUs, and we use the Adam W optimizer (Kingma and Ba 2014) with a learning rate 5 10 5. |
| Software Dependencies | Yes | For agent trajectory generation, we employ GPT3.5 as the LLM and finetune it utilizing a text-to-script dataset to specialize in trajectory generation knowledge. The proposed HDMap generator is built upon SD2.1 (Rombach et al. 2022) with the Control Net parameters (Zhang, Rao, and Agrawala 2023) being trainable. It is trained for 55K iterations with a batch size of 24 at a resolution of 512 512. The video generator leverages SVD (Blattmann et al. 2023) for its robust video generation capabilities, with all parameters finetuned. During Uni MVM training, the model underwent 200K iterations with a batch size of 1, an N = 8 frame length, K = 6 views, and a spatial size of 256 448. All the experiments are conducted on NVIDIA A800 (80GB) GPUs, and we use the Adam W optimizer (Kingma and Ba 2014) with a learning rate 5 10 5. |
| Experiment Setup | Yes | For agent trajectory generation, we employ GPT3.5 as the LLM and finetune it utilizing a text-to-script dataset to specialize in trajectory generation knowledge. The proposed HDMap generator is built upon SD2.1 (Rombach et al. 2022) with the Control Net parameters (Zhang, Rao, and Agrawala 2023) being trainable. It is trained for 55K iterations with a batch size of 24 at a resolution of 512 512. The video generator leverages SVD (Blattmann et al. 2023) for its robust video generation capabilities, with all parameters finetuned. During Uni MVM training, the model underwent 200K iterations with a batch size of 1, an N = 8 frame length, K = 6 views, and a spatial size of 256 448. All the experiments are conducted on NVIDIA A800 (80GB) GPUs, and we use the Adam W optimizer (Kingma and Ba 2014) with a learning rate 5 10 5. |