Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ReSim: Reliable World Simulation for Autonomous Driving

Authors: Jiazhi Yang, Kashyap Chitta, Shenyuan Gao, Long Chen, Yuqian Shao, Xiaosong Jia, Hongyang Li, Andreas Geiger, Xiangyu Yue, Li Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness and versatility of our Re Sim system. 3 Experiments In this section, we first evaluate Re Sim s simulation reliability, specifically relating to its action controllability, video prediction fidelity, and reasonableness of the reward formulation (Sec. 3.1). Next, we validate Re Sim s applicability to real-world driving tasks (Sec. 3.2). Finally, we present ablation studies on data and methodological designs to verify their effectiveness (Sec. 3.3).
Researcher Affiliation	Collaboration	1The Chinese University of Hong Kong 2The University of Hong Kong 3Open Drive Lab at Shanghai AI Lab 4NVIDIA Research 5Xiaomi EV 6Shanghai Jiao Tong University 7University of Tübingen, Tübingen AI Center 8HKUST
Pseudocode	No	The paper describes the model architecture, training pipeline, and loss functions using mathematical equations and descriptive text, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We put more implementation details in Appendix B. We will publicly release our code, model, and dataset. (from Neur IPS Paper Checklist, Q4 Justification) and We will release all code and models. (from Neur IPS Paper Checklist, Q5 Justification).
Open Datasets	Yes	Our training and evaluation are conducted on publicly licensed datasets and benchmarks [34, 124, 35, 23, 14]. To improve action diversity, we collected some data from the CARLA simulator [29] under the CC-BY License.
Dataset Splits	Yes	Specifically, 85K data samples from navtrain split of NAVSIM [23] are included in training. We evaluate the performance of various driving world models with FID [56] and FVD [57] metrics on nu Scenes [34] validation set. This evaluation is conducted on a random subset of the Waymo validation set with 540 samples.
Hardware Specification	Yes	All training stages are conducted on 40 A100 GPUs, and the total training duration is around 14 days. Simulating a 4-second video sequence takes two minutes on a single Nvidia A100 GPU.
Software Dependencies	No	The paper mentions using specific models and toolkits like Open CV, T5 encoder, and DINOv2, but it does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch).
Experiment Setup	Yes	L = Ldiffusion + λLdynamics, where λ is set to 0.1 empirically. K is the maximum timestep intervals considered for latent motion, which is set to 4 in our experiments. Detailed learning configurations for different stages are included in Tab. S.5.