Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

Authors: Senqiao Yang, Jiarui Wu, Jiaming Liu, Xiaoqi Li, Qizhe Zhang, Mingjie Pan, Yulu Gan, Zehui Chen, Shanghang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments were conducted on widely-used TTA and continual TTA benchmarks, and our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.
Researcher Affiliation Academia 1National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University 2University of Science and Technology of China
Pseudocode No The overall framework of our method is shown in Fig .2, and the specially designed prompt Placement and Updating methods are introduced in the following.
Open Source Code No No explicit statement or link regarding open-source code availability for the described methodology was found in the paper.
Open Datasets Yes Cityscapes-to-ACDC is designed for semantic segmentation cross-domain learning. And we conduct four TTA and one CTTA experiment on the scenario. The source model is an off-the-shelf pre-trained segmentation model that was trained on the Cityscapes dataset (Cordts et al. 2016). The ACDC dataset (Sakaridis, Dai, and Van Gool 2021) contains images collected in four different unseen visual conditions: Fog, Night, Rain, and Snow. ... KITTI-to-Driving Stereo. ... The source model employed is an off-the-shelf, pre-trained model, initially trained on the KITTI dataset (Geiger, Lenz, and Urtasun 2012). The Driving Stereo (Yang et al. 2019) comprises images collected under four disparate, unseen visual conditions: foggy, rainy, sunny, and cloudy.
Dataset Splits No Test Time Adaptation (TTA) (Liang, He, and Tan 2023) aims at adapting a pre-trained model with parameters trained on the source data (XS, YS) to multiple unlabeled target data distribution XT1, XT2, . . . , XTn at inference time. The entire process can not access any source domain data and can only access target domain data once.
Hardware Specification Yes All experiments are conducted on NVIDIA A100 GPUs.
Software Dependencies No The optimizer is performed using Adam optimizer (Kingma and Ba 2014) with (β1, β2) = (0.9, 0.999). We set the learning rate specific values for each backbone, such as 3e-4 for Segformer and 1e-4 for DPT, and batch size 1 for both TTA and CTTA experiments.
Experiment Setup Yes The optimizer is performed using Adam optimizer (Kingma and Ba 2014) with (β1, β2) = (0.9, 0.999). We set the learning rate specific values for each backbone, such as 3e-4 for Segformer and 1e-4 for DPT, and batch size 1 for both TTA and CTTA experiments. All experiments are conducted on NVIDIA A100 GPUs.