Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation
Authors: Qi Bi, Jingjun Yi, Hao Zheng, Haolan Zhan, Yawen Huang, Wei Ji, Yuexiang Li, Yefeng Zheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on various DGSS settings show the state-of-the-art performance of our FADA and its versatility to a variety of VFMs. |
| Researcher Affiliation | Collaboration | Qi Bi1 , Jingjun Yi2, Hao Zheng2 , Haolan Zhan3, Yawen Huang2, Wei Ji4, Yuexiang Li5 , Yefeng Zheng1 1Westlake University, China, 2Jarvis Research Center, Tencent Youtu Lab, China, 3Monash University, Australia, 4Yale University, United States, 5University of Macau, Macau |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://github.com/Bi Qi WHU/FADA. |
| Open Datasets | Yes | Five driving-scene semantic segmentation datasets that share 19 common scene categories are used for validation. Specifically, City Scapes (C) [16]... BDD-100K (B) [76]... Mapillary (M) [51]... SYNTHIA (S) [64]... GTA5 (G) [63]... |
| Dataset Splits | Yes | Following the evaluation protocol of existing DGSS methods [55, 56, 14, 58], a certain dataset is used as the source domain for training and the rest four are used as unseen target domains for validation. |
| Hardware Specification | No | The paper does not specify the exact GPU or CPU models, memory, or specific cloud/cluster resources used for the experiments. |
| Software Dependencies | No | The paper mentions using 'DINO-V2' and 'Mask2Former segmentation decoder' but does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | The images are re-sized to 512 × 512 pixels before input to the models. The Adam optimizer with an initial learning rate of 1 × 10−4 is used to train the model. The training process terminates after 20 epochs. |