Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving
Authors: Huitong Yang, Zhuoxiao Chen, Fengyi Zhang, Zi Huang, Yadan Luo
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments across five benchmarks for end-to-end autonomous driving and outdoor 3D object detection: KITTI [13], KITTI-C [25], Waymo [40], nu Scenes [4], and nu Scenes-C [53]. ... Table 1 shows Code Merge consistently outperforms all baselines... Table 5: Ablation study on different checkpoint selection strategies... |
| Researcher Affiliation | Academia | UQMM Lab, The University of Queensland EMAIL |
| Pseudocode | No | The paper describes its methodology using prose and mathematical equations in Section 3 'Our Approach' but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | The code is released at https://github.com/UQHTy/Code Merge. |
| Open Datasets | Yes | We conduct comprehensive experiments across five benchmarks for end-to-end autonomous driving and outdoor 3D object detection: KITTI [13], KITTI-C [25], Waymo [40], nu Scenes [4], and nu Scenes-C [53]. |
| Dataset Splits | Yes | We conduct comprehensive experiments across five benchmarks for end-to-end autonomous driving and outdoor 3D object detection: KITTI [13], KITTI-C [25], Waymo [40], nu Scenes [4], and nu Scenes-C [53]. For test-time adaptation in end-to-end autonomous driving, we pre-train models on the nu Scenes driving benchmark and adapt them to eight real-world corruptions in nu Scenes-C... Table 1: Perception and tracking results of the end-to-end Sparse Drive model [41] with and without TTA on the nu Scenes-C [53] validation set... |
| Hardware Specification | Yes | We report in Section 4 that all experiments run on a single NVIDIA RTX A6000 GPU with 48 Gi B of memory. |
| Software Dependencies | No | The paper mentions employing Res Net50 as the backbone network and the Adam W optimizer, and adopting SECOND as the pretrained model, but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, Python 3.8, CUDA 11.1). |
| Experiment Setup | Yes | For the end-to-end autonomous driving task, we employ Res Net50 [17] as the backbone network... All input images are resized to 256 704. We use a 900 256 instance query as input to the transformer layers. Our optimization strategy utilizes the Adam W optimizer, configured with a weight decay of 0.001 and an initial learning rate of 1 10 7. To balance computational efficiency and prediction accuracy, we apply a random projection module to reduce the dimensionality of query features extracted from the pretrained model, resulting in a compact 1024-dimensional feature vector, and manage predictions through a model bank with a limited capacity of five models. ... For the point cloud detection tasks, we adopt the SECOND [55] as our pretrained model. We configure the training with a batch size of 8, a learning rate of 0.01, and a weight decay of 0.01. Additionally, we utilize a 900 256 dimensional 3D feature vector as input to the leverage module, enabling efficient and effective model merging. |