Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Authors: Peidong Li, Dixiao Cui

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental SSR achieves a 27.2% relative reduction in L2 error and a 51.6% decrease in collision rate to Uni AD in nu Scenes, with a 10.9 faster inference speed and 13 faster training time. Moreover, SSR outperforms VAD-Base with a 48.6point improvement on driving score in CARLA s Town05 Long benchmark. This framework represents a significant leap in real-time autonomous driving systems and paves the way for future scalable deployment.
Researcher Affiliation Industry Peidong Li, Dixiao Cui Zhijia Technology, Suzhou, China EMAIL
Pseudocode No The paper describes the method using mathematical equations and textual explanations, along with architectural diagrams (Figure 3, Figure 4). It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Peidong Li/SSR.
Open Datasets Yes Open-Loop We evaluate the proposed SSR framework for autonomous driving using the widely adopted nu Scenes dataset (Caesar et al., 2020), following prior works (Hu et al., 2023; Jiang et al., 2023). Closed-Loop We conduct closed-loop experiments using the CARLA simulator (Dosovitskiy et al., 2017), leveraging the widely adopted Town05 Long benchmark to evaluate performance. The training dataset consists of 189K frames collected by Roach (Zhang et al., 2021) at 2 Hz across 4 CARLA towns (Town01, Town03, Town04, and Town06), following previous works (Jia et al., 2023a;b; Wu et al., 2022).
Dataset Splits No The paper mentions using the nuScenes dataset and the CARLA simulator's Town05 Long benchmark. It states that "The training data has no overlap with Town05 Long benchmark." and "All metrics are calculated in 3s future horizon with a 0.5s interval and evaluated at 1s, 2s and 3s." However, it does not provide explicit training/validation/test split percentages or sample counts for either dataset, instead referring to following prior works or using specific benchmarks.
Hardware Specification Yes Our open-loop model is trained for 12 epochs on 8 NVIDIA RTX 3090 GPUs with a batch size of 1 per GPU. The closed-loop model is trained for 60 epochs on 4 NVIDIA RTX 3090 GPUs with a batch size of 32 per GPU. : FPS measured on an NVIDIA A100 GPU, while others were tested on an NVIDIA RTX 3090.
Software Dependencies No The paper mentions using 'Adam W (Loshchilov & Hutter, 2019)' as an optimizer and 'Res Net-50'/'Res Net-34' as image backbones, but does not provide specific version numbers for any software libraries, programming languages (e.g., Python, PyTorch, TensorFlow), or other key dependencies.
Experiment Setup Yes Our open-loop model is trained for 12 epochs on 8 NVIDIA RTX 3090 GPUs with a batch size of 1 per GPU. The training phase costs about 11 hours which is 13 faster than Uni AD. We utilize the Adam W (Loshchilov & Hutter, 2019) optimizer with a learning rate set to 5 10 5. The weight of imitation loss and BEV loss is both 1.0. The closed-loop model is trained for 60 epochs on 4 NVIDIA RTX 3090 GPUs with a batch size of 32 per GPU. The learning rate is set to 1 10 4 while being halved after 30 epochs. We adopt Res Net-50 (He et al., 2016) as image backbone operating at an image resolution of 640 360. The BEV representation is generated at a 100 100 resolution and then compressed into sparse scene tokens with shape 16 256. The number of navigation commands remains 3 as prior works (Hu et al., 2023; Jiang et al., 2023). In closed-loop simulation, we utilize Res Net-34 (He et al., 2016) as the image backbone, resizing the input image size to 900 256.