Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

Authors: Yu Yang, Alan Liang, Jianbiao Mei, Yukai Ma, Yong Liu, Gim Hee Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that X-Scene substantially advances controllability and fidelity in large-scale scene generation, empowering data generation and simulation for autonomous driving.
Researcher Affiliation Academia Yu Yang1,2 Alan Liang2 Jianbiao Mei1 Yukai Ma1 Yong Liu1, Gim Hee Lee2, 1 Zhejiang University 2 National University of Singapore
Pseudocode Yes Algorithm 1: Textual Scene Description Generation via VLM, LLM, and RAG
Open Source Code Yes To ensure reproducibility, code and data are committed to be publicly available.
Open Datasets Yes We use Occ3D-nu Scenes [92] to train the occupancy module and nu Scenes [93] for the multi-view image and video generation modules. Additional implementation details are provided in the appendix. F.1 Public Datasets Used nu Scenes1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CC BY-NC-SA 4.0 nu Scenes-devkit2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Apache License 2.0 Occ3D3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MIT License
Dataset Splits Yes We follow the standard split of 700 training and 150 validation scenes.
Hardware Specification Yes training and evaluation were conducted on a single NVIDIA A6000 GPU with 48GB of memory. ... trained over 200 epochs on 4 NVIDIA A6000 GPUs with a batch size of 24 per GPU. ... The temporal module is trained on eight NVIDIA A100 GPUs using the Adam W optimizer with a learning rate of 8 × 10−5 and a cosine learning rate scheduler. ... Each scene chunk is generated in about 6 seconds on a single RTX A6000 GPU
Software Dependencies Yes To construct the scene description memory bank M, we utilize QWen2.5-VL [99] to extract structured information from nu Scenes.
Experiment Setup Yes We employed a batch size of 128 and trained the model for 400 epochs. The optimization was performed using the Adam W optimizer with an initial learning rate of 1 × 10−4 and a cosine annealing scheduler. ... The triplane-VAE is trained using the Adam optimizer with an initial learning rate of 1 × 10−3 and a step decay factor of 0.1, over 200 epochs on 4 NVIDIA A6000 GPUs with a batch size of 24 per GPU. ... The diffusion model is trained from scratch using the Adam W optimizer with an initial learning rate of 1 × 10−4 and a cosine scheduler, over 300 epochs with a batch size of 12 per GPU. ... During inference, we use the Uni PC [100] scheduler with 20 steps and a Classifier-Free Guidance (CFG) scale of 1.2.