MagicDrive: Street View Generation with Diverse 3D Geometry Control

Authors: Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing HONG, Zhenguo Li, Dit-Yan Yeung, Qiang Xu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS", "Dataset and Baselines. We employ the nu Scenes dataset (Caesar et al., 2020)...", "Evaluation Metrics. We evaluate both realism and controllability for street view generation. Realism is mainly measured using Fr echet Inception Distance (FID)...", "Table 1: Comparison of generation fidelity with driving-view generation methods.
Researcher Affiliation Collaboration 1The Chinese University of Hong Kong 2Hong Kong University of Science and Technology 3Huawei Noah s Ark Lab
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Project Page: https://flymin.github.io/magicdrive.
Open Datasets Yes We employ the nu Scenes dataset (Caesar et al., 2020)
Dataset Splits Yes We adhere to the official configuration, utilizing 700 street-view scenes for training and 150 for validation.
Hardware Specification Yes trained on Nvidia V100 GPUs.
Software Dependencies No The paper mentions 'Stable Diffusion v1.5' but does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes We adopt two resolutions to reconcile discrepancies in perception tasks and baselines: 224 400 (0.25 down-sample) following BEVGen and for CVT model support, and a higher 272 736 (0.5 down-sample) for BEVFusion support. Unless stated otherwise, images are sampled using the Uni PC (Zhao et al., 2023) scheduler for 20 steps with CFG at 2.0. We train all newly added parameters using Adam W (Loshchilov & Hutter, 2019) optimizer and a constant learning rate at 8e 5 and batch size 24 (total 144 images for 6 views) with a linear warm-up of 3000 iterations, and set γs = 0.2.