DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Authors: Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves (1) state-of-the-art results on semantic segmentation and instance segmentation; (2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and (3) flexibility for efficient application and novel task composition (e.g., image editing).
Researcher Affiliation Collaboration 1Zhejiang University, China 2University of Chinese Academy of Sciences, China 3Show Lab, National University of Singapore 4Ant Group
Pseudocode No The paper includes figures illustrating the framework and decoder architecture, but it does not contain a pseudocode block or an explicitly labeled algorithm.
Open Source Code Yes The project website is at: weijiawu.github.io/Dataset DM.
Open Datasets Yes Semantic Segmentation. Pascal-VOC 2012 [15] (20 classes) and Cityscapes [11] (19 classes), as two classical benchmark are used to evaluate. ... Instance Segmentation. For the COCO2017 [33] benchmark... Depth Estimation. We synthesized a total of 80k synthetic images for NYU Depth V2 [46]. ... Pose Estimation. We generated a set of 30k synthetic images for COCO2017 Pose dataset [33]...
Dataset Splits No The paper discusses training and testing, and mentions using 'COCO val2017' in tables. However, it does not provide specific details on the dataset splits for validation, such as percentages or sample counts for training/validation/test sets.
Hardware Specification Yes For all tasks, we train Dataset DM for around 50k iterations with images of size 512 512, which only need one Tesla V100 GPU, and lasted for approximately 20 hours.
Software Dependencies No The paper mentions 'Stable diffusion V1 [41] model' and 'Mask2Former [8]' as architectures, and 'Optimizer [36]' as a reference, but it does not specify software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes For all tasks, we train Dataset DM for around 50k iterations with images of size 512 512... Optimizer [36] with a learning rate of 0.0001 is used.