Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Authors: Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness and superiority of our method for text-guided I2I are demonstrated with extensive experiments both qualitatively and quantitatively.
Researcher Affiliation Academia Wangxuan Institute of Computer Technology, Peking University, Beijing, China {gaoxiang1102, icey.x, liujiaying}@pku.edu.cn
Pseudocode No The paper provides architectural diagrams and schematics (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our project is publicly available at: https://xianggao1102.github.io/FCDiffusion/.
Open Datasets Yes We use Stable Diffusion v2-1-base as the pre-trained LDM in our model, and use LAION-Aesthetics 6.5+ which contains 625K image-text pairs as our dataset
Dataset Splits No The paper mentions partitioning the dataset into a training set and a test set at a ratio of 9:1, but does not explicitly provide details for a validation set split.
Hardware Specification Yes Each frequency control branch in our model is separately finetuned for 100K iterations with batch size 4 on a single RTX 3090 Ti GPU.
Software Dependencies No The paper mentions software components like Stable Diffusion v2-1-base, LDM, Control Net, and Open CLIP, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We train at 512 512 image resolution, i.e., H = W = 512, h = w = 64. We set the initial learning rate as 1e-5. Each frequency control branch in our model is separately finetuned for 100K iterations with batch size 4 on a single RTX 3090 Ti GPU. All the results in this paper are generated using the DDIM (Song, Meng, and Ermon 2020) sampler with 50 steps.