Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection
Authors: Gyusam Chang, Jiwon Lee, Donghyun Kim, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sujin Jang, Sangpil Kim
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the robustness of UDGA with large-scale benchmarks: nu Scenes, Lyft, and Waymo, where our framework outperforms the current state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1Korea University 2Samsung Advanced Institute of Technology |
| Pseudocode | No | The paper does not contain a dedicated section, figure, or block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | We plan to release source codes upon acceptance. At current phase, we have provided details of implementations and necessary references to prior works in Sec. 4 (Experiment) and in Appendix A, B. |
| Open Datasets | Yes | Given landmark datasets in 3DOD, nu Scenes [6], Lyft [7] and Waymo [5], we validate the effectiveness of our UDGA framework for the camera-based multi-view 3DOD task. |
| Dataset Splits | Yes | The nu Scenes dataset covers 28k annotated samples for training. Also, validation and test contain 6k scenes each. |
| Hardware Specification | Yes | The training takes approximately 18 hours using one A100 GPU. |
| Software Dependencies | No | The paper mentions BEVDepth and BEVFormer as base detectors and ResNet50 as backbone, but does not provide specific version numbers for software dependencies like PyTorch, CUDA, or other libraries. |
| Experiment Setup | Yes | In BEVDepth, we reshape multi-view input image resolutions as follow: [256, 704] for nu Scenes, [384, 704] for Lyft, [320, 704] for Waymo. As following DG-BEV [14], we train 24 epochs with Adam W optimizer by learning rate 2e-4 in pre-training phase. |