Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving

Authors: Huitong Yang, Zhuoxiao Chen, Fengyi Zhang, Zi Huang, Yadan Luo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments across five benchmarks for end-to-end autonomous driving and outdoor 3D object detection: KITTI [13], KITTI-C [25], Waymo [40], nu Scenes [4], and nu Scenes-C [53]. ... Table 1 shows Code Merge consistently outperforms all baselines... Table 5: Ablation study on different checkpoint selection strategies...
Researcher Affiliation	Academia	UQMM Lab, The University of Queensland EMAIL
Pseudocode	No	The paper describes its methodology using prose and mathematical equations in Section 3 'Our Approach' but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	The code is released at https://github.com/UQHTy/Code Merge.
Open Datasets	Yes	We conduct comprehensive experiments across five benchmarks for end-to-end autonomous driving and outdoor 3D object detection: KITTI [13], KITTI-C [25], Waymo [40], nu Scenes [4], and nu Scenes-C [53].
Dataset Splits	Yes	We conduct comprehensive experiments across five benchmarks for end-to-end autonomous driving and outdoor 3D object detection: KITTI [13], KITTI-C [25], Waymo [40], nu Scenes [4], and nu Scenes-C [53]. For test-time adaptation in end-to-end autonomous driving, we pre-train models on the nu Scenes driving benchmark and adapt them to eight real-world corruptions in nu Scenes-C... Table 1: Perception and tracking results of the end-to-end Sparse Drive model [41] with and without TTA on the nu Scenes-C [53] validation set...
Hardware Specification	Yes	We report in Section 4 that all experiments run on a single NVIDIA RTX A6000 GPU with 48 Gi B of memory.
Software Dependencies	No	The paper mentions employing Res Net50 as the backbone network and the Adam W optimizer, and adopting SECOND as the pretrained model, but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, Python 3.8, CUDA 11.1).
Experiment Setup	Yes	For the end-to-end autonomous driving task, we employ Res Net50 [17] as the backbone network... All input images are resized to 256 704. We use a 900 256 instance query as input to the transformer layers. Our optimization strategy utilizes the Adam W optimizer, configured with a weight decay of 0.001 and an initial learning rate of 1 10 7. To balance computational efficiency and prediction accuracy, we apply a random projection module to reduce the dimensionality of query features extracted from the pretrained model, resulting in a compact 1024-dimensional feature vector, and manage predictions through a model bank with a limited capacity of five models. ... For the point cloud detection tasks, we adopt the SECOND [55] as our pretrained model. We configure the training with a batch size of 8, a learning rate of 0.01, and a weight decay of 0.01. Additionally, we utilize a 900 256 dimensional 3D feature vector as input to the leverage module, enabling efficient and effective model merging.