Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection

Authors: Gyusam Chang, Jiwon Lee, Donghyun Kim, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sujin Jang, Sangpil Kim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the robustness of UDGA with large-scale benchmarks: nu Scenes, Lyft, and Waymo, where our framework outperforms the current state-of-the-art methods.
Researcher Affiliation Collaboration 1Korea University 2Samsung Advanced Institute of Technology
Pseudocode No The paper does not contain a dedicated section, figure, or block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No We plan to release source codes upon acceptance. At current phase, we have provided details of implementations and necessary references to prior works in Sec. 4 (Experiment) and in Appendix A, B.
Open Datasets Yes Given landmark datasets in 3DOD, nu Scenes [6], Lyft [7] and Waymo [5], we validate the effectiveness of our UDGA framework for the camera-based multi-view 3DOD task.
Dataset Splits Yes The nu Scenes dataset covers 28k annotated samples for training. Also, validation and test contain 6k scenes each.
Hardware Specification Yes The training takes approximately 18 hours using one A100 GPU.
Software Dependencies No The paper mentions BEVDepth and BEVFormer as base detectors and ResNet50 as backbone, but does not provide specific version numbers for software dependencies like PyTorch, CUDA, or other libraries.
Experiment Setup Yes In BEVDepth, we reshape multi-view input image resolutions as follow: [256, 704] for nu Scenes, [384, 704] for Lyft, [320, 704] for Waymo. As following DG-BEV [14], we train 24 epochs with Adam W optimizer by learning rate 2e-4 in pre-training phase.