Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Authors: Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, haoran yin, Xiangyu Li, xinbang zhang, ying zhang, Wenyu Liu, Qian Zhang, Xinggang Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the effectiveness of our approach, we construct a closed-loop evaluation benchmark comprising diverse, unseen 3DGS environments. Our method, RAD, outperforms IL-based approaches across most closed-loop metrics, notably achieving a collision rate that is 3 lower. 4 Experiments 4.1 Experimental Settings Dataset and Benchmark. We collect 2000 hours of expert driving demonstrations in real-world conditions and generate map and agent annotations via an automated pipeline for perception pretraining. Ego-vehicle odometry is employed for planning pre-training. For reinforcement learning, we Table 4: Closed-loop quantitative comparisons with other IL-based methods on the 3DGS evaluation benchmark.
Researcher Affiliation Collaboration Hao Gao1, Shaoyu Chen1,2, Bo Jiang1 Bencheng Liao1 Yiang Shi1 Xiaoyang Guo2 Yuechuan Pu2 Haoran Yin2 Xiangyu Li2 Xinbang Zhang2 Ying Zhang2 Wenyu Liu1 Qian Zhang2 Xinggang Wang1, 1 Huazhong University of Science & Technology 2 Horizon Robotics
Pseudocode No The paper describes the overall framework, training paradigm, and optimization steps using textual descriptions, equations, and diagrams (e.g., Fig. 2 and Fig. 3). However, it does not include a clearly labeled pseudocode block or algorithm section with structured steps like code.
Open Source Code Yes Code is available at https://github.com/hustvl/RAD for facilitating future research.
Open Datasets No Dataset and Benchmark. We collect 2000 hours of expert driving demonstrations in real-world conditions and generate map and agent annotations via an automated pipeline for perception pretraining. Ego-vehicle odometry is employed for planning pre-training. For reinforcement learning, we select 4305 real-world driving scenes covering diverse road types, traffic densities, and agent behaviors to ensure environmental diversity. These scenes are first reconstructed into 3DGS environments, from each of which a fixed-length 8-second clip is extracted. Among these clips, 3968 are used for RL training, and the other 337 are used as closed-loop evaluation benchmarks. Justification: Due to company confidentiality and data privacy policies, we are currently unable to release the dataset.
Dataset Splits Yes Among these clips, 3968 are used for RL training, and the other 337 are used as closed-loop evaluation benchmarks.
Hardware Specification Yes Table 5: Hyperparameters used in RAD Planning Pre-Training stage. ... Traning GPU 128 RTX4090 Table 6: Hyperparameters used in RAD Reinforced Post-Training stage. ... Traning GPU 32 RTX4090
Software Dependencies No optimizer Adam W [50, 51] optimizer hyper-parameters β1, β2, ϵ = 0.9, 0.999, 1e-8
Experiment Setup Yes Table 5: Hyperparameters used in RAD Planning Pre-Training stage. learning rate 1e-4 learning rate schedule cosine decay optimizer Adam W [50, 51] optimizer hyper-parameters β1, β2, ϵ = 0.9, 0.999, 1e-8 weight decay 1e-4 batch size 512 training steps 30k planning head dim 256 Traning GPU 128 RTX4090 Table 6: Hyperparameters used in RAD Reinforced Post-Training stage. learning rate 5e-6 learning rate schedule cosine decay optimizer Adam W [50, 51] optimizer hyper-parameters β1, β2, ϵ = 0.9, 0.999, 1e-8 weight decay 1e-4 RL worker number 32 RL batch size 32 IL batch size 128 GAE parameter γ = 0.9, λ = 0.95 clipping thresholds ϵx = 0.1, ϵy = 0.2 deviation threshold dmax = 2.0m, ψmax = 40 planning head dim 256 value function dim 256 Traning GPU 32 RTX4090