Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy

Authors: Rui Zhao, Yuze Fan, Ziguo Chen, Fei Gao, Zhenhai Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Diff E2E achieves state-of-the-art performance on both CARLA closed-loop benchmarks and NAVSIM evaluations. The proposed unified framework that integrates diffusion and explicit strategies provides a generalizable paradigm for hybrid action representation and shows substantial potential for extension to broader domains, including embodied intelligence.
Researcher Affiliation	Academia	1College of Automotive Engineering, Jilin University 2National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University
Pseudocode	No	The paper describes the methodology in prose and mathematical equations within Section 3 'Methodology', but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Justification: The code and model checkpoints will be released soon.
Open Datasets	Yes	This research is primarily evaluated using the CARLA simulator closed-loop benchmark [14] and the NAVSIM non-reactive simulation benchmark [12].
Dataset Splits	Yes	We adopt CARLA Longest6, CARLA Town05 Long, and CARLA Town05 Short as evaluation benchmarks [9, 38], using the official Driving Score (DS), Route Completion (RC), and Infraction Score (IS) as metrics. Detailed implementation details and baseline descriptions are provided in Appendix B.1. This study builds a model training framework based on NAVSIM s navtrain dataset. Unlike the CARLA setup, we adopt Vov Net V2-99 [30] as the feature extraction backbone network in NAVSIM. The Predictive Driver Model Score (PDMS) is used as a comprehensive metric, combining key driving dimensions via weighted integration: No at-fault Collision (NC), Drivable Area Compliance (DAC), Time-To-Collision (TTC), Comfort (C), and Ego Progress (EP). Detailed implementation details and baseline descriptions can be found in Appendix B.2.
Hardware Specification	Yes	All experiments are conducted on four NVIDIA 3090 GPUs.
Software Dependencies	No	The paper mentions software components like 'Reg Net Y-3.2GF' and 'Vov Net V2-99' as encoders, which are models typically implemented using deep learning frameworks (e.g., PyTorch, TensorFlow), but it does not specify explicit version numbers for these frameworks or any other software libraries used for implementation.
Experiment Setup	Yes	The entire training is divided into two stages, each trained for 30 epochs, with an initial learning rate of 3e-4. Batch size is adapted for different stages 16 for the first stage and 256 for the second stage to accelerate the convergence of the diffusion model. The specific hyperparameter settings are shown in Table 5.