Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Model-Based Policy Adaptation for Closed-Loop End-to-end Autonomous Driving

Authors: Haohong Lin, Yunzhi Zhang, Wenhao Ding, Jiajun Wu, DING ZHAO

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the nu Scenes benchmark using a photorealistic closed-loop simulator demonstrate that MPA significantly improves performance across in-domain, out-of-domain, and safety-critical scenarios.
Researcher Affiliation Collaboration Haohong Lin1, Yunzhi Zhang2, Wenhao Ding3, Jiajun Wu2, Ding Zhao1 1CMU, 2Stanford, 3NVIDIA EMAIL
Pseudocode Yes The full generation procedure is summarized in Algorithm 1. With the generated counterfactual dataset at hand, we then conduct policy learning and value learning in the following sub-sections.
Open Source Code Yes The link to our anonymous codebase is attached at: https://anonymous.4open.science/r/MPA-7432.
Open Datasets Yes We utilize the nu Scenes dataset [29] that consists of 5.5 hours of driving data in Boston and Singapore.
Dataset Splits Yes We train on a split of 290 scenes in the nu Scenes train-val split, and evaluate on three settings. (1) In-domain evaluation: the model will be tested on a sub-split of 70 scenes, the surrounding dynamic entities (vehicles, pedestrians) will be replayed by a fixed ratio of their reference trajectory in the offline dataset. (2) Unseen nominal scene evaluation: the model will be tested on a sub-split of 70 scenes that are unseen yet during training, the surrounding dynamic entities (vehicles, pedestrians) are nominal and will be replayed by a fixed ratio of their reference trajectory in the offline dataset. (3) Safety-critical evaluation: the model will be tested on 10 scenes, where there exists one (or few) non-native agents to challenge the ego agents in an adversarial way.
Hardware Specification Yes The experiments are run on a server with AMD EPYC 7542 32-Core Processor CPU with 256 threads, 4 NVIDIA A5000 graphics, and 252 GB memory.
Software Dependencies No The paper mentions software components and architectures like Res Net-18, AdamW, DDIM, Uni AD, VAD, and LTF, but does not provide specific version numbers for underlying software libraries or programming languages (e.g., Python version, PyTorch version, CUDA version) used in their implementation.
Experiment Setup Yes Table 4: Hyperparameters for Our Methods. Component Value Description Policy Adapter Architecture BEV encoder Res Net-18 CNN Encoder for BEV input. Ego encoder 128-dim Encodes ego history features. Action encoder 128-dim Encodes the base actions and noisy target future trajectories. Fused input dimension 960 Concatenated input vector. Latent fusion module 1D U-Net Applies down/up-sampling over 960-dim vector. Latent dimension 256 Feature dimension used throughout latent layers. Residual prediction heads 1-20 Each outputs a 12-dimensional trajectory residual. Mixture weight head 1 Outputs logits over modes. DDIM steps (training) 25 Number of noisy timesteps during training. Training Settings (Policy Adapter) Batch size 256 Samples per mini-batch. Learning rate 1 10 4 Initial LR with cosine decay. Optimizer Adam W With weight decay of 10 4. Epochs 1000 Total training iterations. Gradient clipping 1.0 Global gradient norm threshold.