Guidance with Spherical Gaussian Constraint for Conditional Diffusion

Authors: Lingxiao Yang, Shutong Ding, Yifan Cai, Jingyi Yu, Jingya Wang, Ye Shi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experimental results in various conditional generation tasks validate the superiority and adaptability of DSG in terms of both sample quality and time efficiency.
Researcher Affiliation Academia 1Shanghai Tech University 2Mo E Key Laboratory of Intelligent Perception and Human Machine Collaboration. Correspondence to: Ye Shi <shiye@shanghaitech.edu.cn>.
Pseudocode Yes Algorithm 1 Diffusion with Spherical Gaussian constraint
Open Source Code Yes 1Code is available at https://github.com/Lingxiao Yang2023/DSG2024.
Open Datasets Yes we evaluate our method in 1k images of FFHQ256 256 (Karras et al., 2019) and Imagenet256 256 validation dataset (Deng et al., 2009)
Dataset Splits No The paper mentions using specific datasets (FFHQ, Imagenet validation dataset, Celeb A-HQ test set) for evaluation but does not explicitly provide the training/validation/test splits (e.g., percentages, counts, or direct references to specific splits used in their experiments) for these datasets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software components and pre-trained models such as LDM (Rombach et al., 2022), CLIP (Radford et al., 2021) image encoder, Mobile Net V3-Large segmentation network, and Arc Face (Deng et al., 2019), but it does not specify version numbers for any of these.
Experiment Setup Yes For linear inverse problem y = Ax + ϵ including Inpainting, Super-resolution, and Gaussian deblurring, we evaluate our method in 1k images of FFHQ256 256 (Karras et al., 2019) and Imagenet256 256 validation dataset (Deng et al., 2009) using pre-trained diffusion models taken from (Chung et al., 2023; Dhariwal & Nichol, 2021)... We generate noisy measurements by introducing Gaussian noise ϵ N(0, 0.05)... For Super-resolution, we employ bicubic downsampling to achieve a 4x reduction in resolution. (iii) For Gaussian deblurring, we apply a 61x61 kernel size Gaussian blur with a standard deviation of 3.0. The loss guidance can be expressed as: Loss(x0, y) = ||Aˆx0(xt) y||2 2. The hyperparameters we used for the linear inverse problem in FFHQ are shown in Table 4. For the linear inverse problem in Image Net, we use gr = 0.2, i = 5 for Inpainting, gr = 0.1, i = 10 for Super-resolution, and gr = 0.1, i = 5 for Gaussian deblurring. For Style Guidance, Text-Style Guidance and Text-Segmentation Guidance, we set gr = 0.1, i = 1. For Face ID Guidance, we set the gr = 0.05, i = 1.