Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
Authors: Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness and superiority of our method for text-guided I2I are demonstrated with extensive experiments both qualitatively and quantitatively. |
| Researcher Affiliation | Academia | Wangxuan Institute of Computer Technology, Peking University, Beijing, China {gaoxiang1102, icey.x, liujiaying}@pku.edu.cn |
| Pseudocode | No | The paper provides architectural diagrams and schematics (Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our project is publicly available at: https://xianggao1102.github.io/FCDiffusion/. |
| Open Datasets | Yes | We use Stable Diffusion v2-1-base as the pre-trained LDM in our model, and use LAION-Aesthetics 6.5+ which contains 625K image-text pairs as our dataset |
| Dataset Splits | No | The paper mentions partitioning the dataset into a training set and a test set at a ratio of 9:1, but does not explicitly provide details for a validation set split. |
| Hardware Specification | Yes | Each frequency control branch in our model is separately finetuned for 100K iterations with batch size 4 on a single RTX 3090 Ti GPU. |
| Software Dependencies | No | The paper mentions software components like Stable Diffusion v2-1-base, LDM, Control Net, and Open CLIP, but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We train at 512 512 image resolution, i.e., H = W = 512, h = w = 64. We set the initial learning rate as 1e-5. Each frequency control branch in our model is separately finetuned for 100K iterations with batch size 4 on a single RTX 3090 Ti GPU. All the results in this paper are generated using the DDIM (Song, Meng, and Ermon 2020) sampler with 50 steps. |