Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

Authors: Qi Bi, Jingjun Yi, Hao Zheng, Haolan Zhan, Yawen Huang, Wei Ji, Yuexiang Li, Yefeng Zheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on various DGSS settings show the state-of-the-art performance of our FADA and its versatility to a variety of VFMs.
Researcher Affiliation Collaboration Qi Bi1 , Jingjun Yi2, Hao Zheng2 , Haolan Zhan3, Yawen Huang2, Wei Ji4, Yuexiang Li5 , Yefeng Zheng1 1Westlake University, China, 2Jarvis Research Center, Tencent Youtu Lab, China, 3Monash University, Australia, 4Yale University, United States, 5University of Macau, Macau
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://github.com/Bi Qi WHU/FADA.
Open Datasets Yes Five driving-scene semantic segmentation datasets that share 19 common scene categories are used for validation. Specifically, City Scapes (C) [16]... BDD-100K (B) [76]... Mapillary (M) [51]... SYNTHIA (S) [64]... GTA5 (G) [63]...
Dataset Splits Yes Following the evaluation protocol of existing DGSS methods [55, 56, 14, 58], a certain dataset is used as the source domain for training and the rest four are used as unseen target domains for validation.
Hardware Specification No The paper does not specify the exact GPU or CPU models, memory, or specific cloud/cluster resources used for the experiments.
Software Dependencies No The paper mentions using 'DINO-V2' and 'Mask2Former segmentation decoder' but does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch versions).
Experiment Setup Yes The images are re-sized to 512 × 512 pixels before input to the models. The Adam optimizer with an initial learning rate of 1 × 10−4 is used to train the model. The training process terminates after 20 epochs.