UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior
Authors: Yao Wu, Mingwei Xing, Yachao Zhang, Xiaotong Luo, Yuan Xie, Yanyun Qu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, Uni DSeg achieves 57.5%/54.4% m Io U on A2D2/s KITTI for domain adaptive/generalized tasks. |
| Researcher Affiliation | Academia | 1School of Informatics, Xiamen University 2Institute of Artificial Intelligence, Xiamen University 3Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University 4School of Computer Science and Technology, East China Normal University 5Chongqing Institute of East China Normal University |
| Pseudocode | No | The paper describes methods and architectures but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is available at https://github.com/Barcaaaa/Uni DSeg. |
| Open Datasets | Yes | For evaluation, we use four public autonomous driving benchmarks, including three real datasets: nu Scenes [4], Semantic KITTI [3], A2D2 [12] and one synthetic dataset: Virtual KITTI [11]. |
| Dataset Splits | Yes | The split details are tabulated in Tab. 8. Table 8: Size of the splits in frames for all proposed cross-domain learning scenarios. Scenarios Source Target Categories Train Train Val/Test |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA RTX 3090. |
| Software Dependencies | No | We utilize the MMSegmentation [8] codebase for the decoder head, Semantic FPN [22]... For the 3D backbone, we employ Sparse Conv Net [15] with a U-Net architecture in Sparse Convolution Library [9]. |
| Experiment Setup | Yes | Our model is trained on nu Scenes:Day/Night , A2D2/s KITTI , and A2D2/nu Scenes for 100k iterations. We utilize an iteration-based learning schedule where the initial learning rate is set to 1e-3 except for the 2D encoder which is 1e-4, and then it is divided by 10 at 80k and 90k iterations. ... The batch size is set to 8. As regards the hyper-parameters, following [19], λS and λT in cross-modal loss are set to 1.0 and 0.1 on nu Scenes:Day/Night , nu Scenes:USA/Sing. , and nu Scenes:Sing./USA , 0.1 and 0.01 on v KITTI/s KITTI , A2D2/s KITTI , and A2D2/nu Scenes respectively, without performing any fine-tuning on these values. |