When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation

Authors: Xinhong Ma, Yiming Wang, Hao Liu, Tianyu Guo, Yunhe Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Uni-UVPT achieves state-of-the-art performance on GTA5 Cityscapes and SYNTHIA Cityscapes tasks
Researcher Affiliation Industry Xinhong Ma, Yiming Wang, Hao Liu, Tianyu Guo, Yunhe Wang Huawei Noah s Ark Lab {maxinhong, wangyiming22, liuhao296, tianyu.guo, yunhe.wang}@huawei.com
Pseudocode No The paper describes its methods textually and with architectural diagrams (Figure 1), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No Code will be available at https://gitee.com/mindspore/ models/tree/master/research/cv/uni-uvpt and https://github.com/ huawei-noah/noah-research/tree/master/uni-uvpt.
Open Datasets Yes GTA5 [37] is a large-scale driving-game dataset... The realistic dataset Cityscapes [10] collects street view scenes... Synthia dataset [38] is rendered from a virtual city...
Dataset Splits Yes The realistic dataset Cityscapes [10] collects street view scenes from 50 different cities with 19 classes, including 2,975 training images and 500 validation images.
Hardware Specification Yes Our approach is implemented based on the MMSegmentation framework [9] and one training task requires one NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions using the 'MMSegmentation framework [9]' and specific models like 'Swin-B [31]' and 'Mi T-B5 [43]', but does not provide specific version numbers for these software components or other dependencies (e.g., Python, PyTorch versions).
Experiment Setup Yes Specifically, the learning rate of Swin-B encoder is set as 6 10 6 and 4 10 6 for the Mi T-B5 encoder, while the learning rates of the segmentation head and prompt adapter are respectively set as ten and five times of backbone. Our Uni-UVPT framework typically needs 40k-80k iterations with a batch size of 1 until convergence.