Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
Authors: Ruoyu Wang, Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiment results show that our proposed COW can achieve more flexible customization based on strict visual conditions in different application settings. Project page: https://wangruoyu02.github.io/cow.github.io/ |
| Researcher Affiliation | Academia | 1 School of Computer Science, Wuhan University 2 Department of Computer Science, Princeton University {wangruoyu, yongqiyang, qianzhihao, wuyucs}@whu.edu.cn yezhu@princeton.edu |
| Pseudocode | Yes | We show the pseudo-code for our proposed Cyclic One-Way Diffusion approach in Algo. 1. Algorithm 1: Cyclic One-Way Diffusion |
| Open Source Code | No | Project page: https://wangruoyu02.github.io/cow.github.io/ (This is a project page, not a direct link to a code repository, and the paper does not explicitly state that the code is released.) |
| Open Datasets | Yes | To simulate the visual condition processing in real scenarios, we adopt face image masks from Celeb AMask-HQ (Lee et al., 2020) as our visual condition. |
| Dataset Splits | No | The paper performs comparisons on a dataset, stating: 'We perform all comparisons with baselines on 200 images and 300 texts (100 texts for every setting, 1 text to 2 faces in order).', but it does not specify explicit training/validation/test splits for reproducibility, especially as the method is described as 'training-free'. |
| Hardware Specification | Yes | We use a single NVIDIA 4090 GPU to run experiments since our proposed COW method is training-free. |
| Software Dependencies | No | The paper mentions specific models like 'sd-v2-1-base' and 'sd-v1-5' for Stable Diffusion, but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | We implement COW, SD inpainting, and Control Net on pre-trained T2I Stable Diffusion model (Rombach et al., 2022) sd-v2-1-base, with default configuration condition scale set 7.5, noise level η set 1, image size set 512, 50 steps generation (10 steps for COW), and negative prompt set to a bad quality and low-resolution image, extra fingers, deformed hands . Note that we implement DB and TI following their official code and use the highest supported Stable Diffusion version sd-v1-5. During generation, we set the visual condition size to 256 and randomly chose a place above half of the image to add the visual condition. We choose xt1 to step 25, xt2 to 35, cycle number to 10. We use slightly different settings for the three different tasks. We set xt3 to be [4, 3], eta to be 0 in the normal prompts, xt3 to be 4, eta to be 0.1 in the attribute editing prompts, and xt3 to be 4, eta to be 1 in the style prompts. |