Cross-Modal Contrastive Learning for Domain Adaptation in 3D Semantic Segmentation

Authors: Bowei Xing, Xianghua Ying, Ruibin Wang, Jinfa Yang, Taiyan Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on three unsupervised domain adaptation scenarios, including country-to-country, day-to-night, and dataset-to-dataset. Experimental results show that our approach outperforms existing methods, which demonstrates the effectiveness of the proposed method.
Researcher Affiliation Academia Key Laboratory of Machine Perception (MOE) School of Intelligence Science and Technology, Peking University {xingbowei, xhying, robin wang, jinfayang}@pku.edu.cn, chenty@stu.pku.edu.cn
Pseudocode No The paper includes architectural diagrams (Figure 1 and 2) but does not contain any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes Three autonomous driving datasets: nu Scenes (Caesar et al. 2020), A2D2 (Geyer et al. 2020) and Semantic KITTI (Behley et al. 2019) are adopted.
Dataset Splits No The paper mentions using nu Scenes, A2D2, and Semantic KITTI datasets for experiments and adaptation scenarios like USA/Singapore and Day/Night, but it does not explicitly state the training, validation, and test dataset splits used for reproduction.
Hardware Specification No The paper mentions 'empirical GPU memory concern' but does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using U-Net and Sparse Conv Net as backbones but does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In training process, the learning rate is set to 0.001 at initial and is divided by 10 at 80k and 90k iterations. We totally train the model for 100k iterations on each adaptation scenario. For the neighborhood features, we adopt the nearby region of 5 × 5. For dilated neighbor features, we sample the features from the nearby 9 × 9 region with dilated rate 2, which also leads to a number of 25 features for each pixel. The batch size is set as 8 for USA/Singapore and Day/Night, and 6 for the A2D2/Semantic KITTI. Due to GPU memory limitation, 30% features in each minibatch are sampled to calculate contrastive loss in the former two scenarios and 20% features are sampled in the last scenario.