UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior

Authors: Yao Wu, Mingwei Xing, Yachao Zhang, Xiaotong Luo, Yuan Xie, Yanyun Qu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, Uni DSeg achieves 57.5%/54.4% m Io U on A2D2/s KITTI for domain adaptive/generalized tasks.
Researcher Affiliation Academia 1School of Informatics, Xiamen University 2Institute of Artificial Intelligence, Xiamen University 3Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University 4School of Computer Science and Technology, East China Normal University 5Chongqing Institute of East China Normal University
Pseudocode No The paper describes methods and architectures but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code is available at https://github.com/Barcaaaa/Uni DSeg.
Open Datasets Yes For evaluation, we use four public autonomous driving benchmarks, including three real datasets: nu Scenes [4], Semantic KITTI [3], A2D2 [12] and one synthetic dataset: Virtual KITTI [11].
Dataset Splits Yes The split details are tabulated in Tab. 8. Table 8: Size of the splits in frames for all proposed cross-domain learning scenarios. Scenarios Source Target Categories Train Train Val/Test
Hardware Specification Yes All experiments are conducted on NVIDIA RTX 3090.
Software Dependencies No We utilize the MMSegmentation [8] codebase for the decoder head, Semantic FPN [22]... For the 3D backbone, we employ Sparse Conv Net [15] with a U-Net architecture in Sparse Convolution Library [9].
Experiment Setup Yes Our model is trained on nu Scenes:Day/Night , A2D2/s KITTI , and A2D2/nu Scenes for 100k iterations. We utilize an iteration-based learning schedule where the initial learning rate is set to 1e-3 except for the 2D encoder which is 1e-4, and then it is divided by 10 at 80k and 90k iterations. ... The batch size is set to 8. As regards the hyper-parameters, following [19], λS and λT in cross-modal loss are set to 1.0 and 0.1 on nu Scenes:Day/Night , nu Scenes:USA/Sing. , and nu Scenes:Sing./USA , 0.1 and 0.01 on v KITTI/s KITTI , A2D2/s KITTI , and A2D2/nu Scenes respectively, without performing any fine-tuning on these values.