Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis

Authors: Juyeon Ko, Inho Kong, Dogyun Park, Hyunwoo J. Kim

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that the proposed method generates high-quality samples through extensive experiments and analyses on benchmark datasets, including a novel experimental setup simulating human errors during real-world applications. We conduct extensive experiments and analyses on benchmark datasets and achieve competitive results.
Researcher Affiliation Academia Juyeon Ko * 1 Inho Kong * 1 Dogyun Park 1 Hyunwoo J. Kim 1 1Department of Computer Science, Korea University, Republic of Korea. Correspondence to: Hyunwoo J. Kim <hyunwoojkim@korea.ac.kr>.
Pseudocode Yes Appendix B: Algorithms Algorithm 1 and 2 summarize the general training and sampling process of our SCDM, respectively.
Open Source Code Yes Code is available at https://github.com/mlvlab/SCDM.
Open Datasets Yes We evaluate our method based on ADE20K (Zhou et al., 2017) dataset. More experimental results with other benchmark datasets (e.g., Celeb AMask-HQ (Lee et al., 2020) and COCOStuff (Caesar et al., 2018)) are in Appendix G.
Dataset Splits No The paper specifies '20K images for training and 2K images for test' for ADE20K, but does not explicitly mention or detail a validation dataset split or percentages for it.
Hardware Specification Yes We trained our model with 4 NVIDIA RTX A6000 GPUs for 1-2 days. Image sampling and evaluations are conducted on a server with 8 NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions software like 'Adam W' and implies use of deep learning frameworks, but does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes For the hyperparameters, we used λ = 0.001 for our hybrid loss (Nichol & Dhariwal, 2021), classifier-free guidance (Ho & Salimans, 2021) scale s = 0.5, 20% of drop rate for the SIS experiments on three datasets, noise schedule hyperparameter η = 1, dynamic thresholding (Saharia et al., 2022) percentile of 0.95, and the extrapolation scale of w = 0.8.