Guidance with Spherical Gaussian Constraint for Conditional Diffusion
Authors: Lingxiao Yang, Shutong Ding, Yifan Cai, Jingyi Yu, Jingya Wang, Ye Shi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experimental results in various conditional generation tasks validate the superiority and adaptability of DSG in terms of both sample quality and time efficiency. |
| Researcher Affiliation | Academia | 1Shanghai Tech University 2Mo E Key Laboratory of Intelligent Perception and Human Machine Collaboration. Correspondence to: Ye Shi <shiye@shanghaitech.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Diffusion with Spherical Gaussian constraint |
| Open Source Code | Yes | 1Code is available at https://github.com/Lingxiao Yang2023/DSG2024. |
| Open Datasets | Yes | we evaluate our method in 1k images of FFHQ256 256 (Karras et al., 2019) and Imagenet256 256 validation dataset (Deng et al., 2009) |
| Dataset Splits | No | The paper mentions using specific datasets (FFHQ, Imagenet validation dataset, Celeb A-HQ test set) for evaluation but does not explicitly provide the training/validation/test splits (e.g., percentages, counts, or direct references to specific splits used in their experiments) for these datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software components and pre-trained models such as LDM (Rombach et al., 2022), CLIP (Radford et al., 2021) image encoder, Mobile Net V3-Large segmentation network, and Arc Face (Deng et al., 2019), but it does not specify version numbers for any of these. |
| Experiment Setup | Yes | For linear inverse problem y = Ax + ϵ including Inpainting, Super-resolution, and Gaussian deblurring, we evaluate our method in 1k images of FFHQ256 256 (Karras et al., 2019) and Imagenet256 256 validation dataset (Deng et al., 2009) using pre-trained diffusion models taken from (Chung et al., 2023; Dhariwal & Nichol, 2021)... We generate noisy measurements by introducing Gaussian noise ϵ N(0, 0.05)... For Super-resolution, we employ bicubic downsampling to achieve a 4x reduction in resolution. (iii) For Gaussian deblurring, we apply a 61x61 kernel size Gaussian blur with a standard deviation of 3.0. The loss guidance can be expressed as: Loss(x0, y) = ||Aˆx0(xt) y||2 2. The hyperparameters we used for the linear inverse problem in FFHQ are shown in Table 4. For the linear inverse problem in Image Net, we use gr = 0.2, i = 5 for Inpainting, gr = 0.1, i = 10 for Super-resolution, and gr = 0.1, i = 5 for Gaussian deblurring. For Style Guidance, Text-Style Guidance and Text-Segmentation Guidance, we set gr = 0.1, i = 1. For Face ID Guidance, we set the gr = 0.05, i = 1. |