Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
Authors: Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Bo Li, Yang Tang, Pan Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities. |
| Researcher Affiliation | Collaboration | Ruihao Xia1,2 , Yu Liang2 , Peng-Tao Jiang2 , Hao Zhang2 Bo Li2 , Yang Tang1,3 , Pan Zhou4 1East China University of Science and Technology, 2vivo Mobile Communication Co., Ltd 3Peng Cheng Laboratory, 4Singapore Management University |
| Pseudocode | No | The paper provides a framework diagram (Figure 2) but does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | We open-source our code and models at https://github.com/Xia-Rho/MADM. |
| Open Datasets | Yes | In our experiments, we adopt the Cityscapes-Image [13] dataset as the source modality and the DELIVER-Depth [5], FMB-Infrared [6], and DSEC-Event [7] datasets as the target modalities. |
| Dataset Splits | Yes | Cityscapes [13] is the source dataset in our experiments... split into 2,975 training images and 500 validation images... DELIVER [5]... contains 3,983/2,005/1,897 samples for training/validation/testing... |
| Hardware Specification | Yes | Experiments are conducted on a NVIDIA H800 GPU, occupying about 57G memory. |
| Software Dependencies | No | The paper mentions using the Stable Diffusion v1-4 model and DAFormer components but does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train our MADM for 10k iterations with a batch size of 2 and an image resolution of 512 × 512. The optimization is instantiated with Adam W [45] with a learning rate of 5e-6. For hyperparameters β, γ, and λreg in DPLG and LPLR, we set them to {5000,60,1.0}/{8000,50,1.0}/{8000,50,10.0} for depth/infrared/event modalities, respectively. |