reproducibilityindex.ai

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

Authors: Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Bo Li, Yang Tang, Pan Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
Researcher Affiliation	Collaboration	Ruihao Xia1,2 , Yu Liang2 , Peng-Tao Jiang2 , Hao Zhang2 Bo Li2 , Yang Tang1,3 , Pan Zhou4 1East China University of Science and Technology, 2vivo Mobile Communication Co., Ltd 3Peng Cheng Laboratory, 4Singapore Management University
Pseudocode	No	The paper provides a framework diagram (Figure 2) but does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	We open-source our code and models at https://github.com/Xia-Rho/MADM.
Open Datasets	Yes	In our experiments, we adopt the Cityscapes-Image [13] dataset as the source modality and the DELIVER-Depth [5], FMB-Infrared [6], and DSEC-Event [7] datasets as the target modalities.
Dataset Splits	Yes	Cityscapes [13] is the source dataset in our experiments... split into 2,975 training images and 500 validation images... DELIVER [5]... contains 3,983/2,005/1,897 samples for training/validation/testing...
Hardware Specification	Yes	Experiments are conducted on a NVIDIA H800 GPU, occupying about 57G memory.
Software Dependencies	No	The paper mentions using the Stable Diffusion v1-4 model and DAFormer components but does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We train our MADM for 10k iterations with a batch size of 2 and an image resolution of 512 × 512. The optimization is instantiated with Adam W [45] with a learning rate of 5e-6. For hyperparameters β, γ, and λreg in DPLG and LPLR, we set them to {5000,60,1.0}/{8000,50,1.0}/{8000,50,10.0} for depth/infrared/event modalities, respectively.