Real-World Image Variation by Aligning Diffusion Inversion Chain

Authors: Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that our proposed approach outperforms existing methods concerning semantic similarity and perceptual quality.
Researcher Affiliation Collaboration 1The Chinese University of Hong Kong 2Smart More
Pseudocode No The paper describes its methods using equations and prose but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://rival-diff.github.io
Open Datasets Yes Our study obtained a high-quality test set of reference images from the Internet and Dream Booth [12] to ensure a diverse image dataset.
Dataset Splits No The paper mentions using a "test set" and "evaluation samples are from two datasets" but does not provide specific details on training, validation, or test dataset splits or percentages.
Hardware Specification Yes Experiments run on a single NVIDIA RTX4090 GPU with 8 seconds to generate image variation with batch size 1.
Software Dependencies Yes Our baseline model is Stable-Diffusion V1.5. During the image inversion and generation, we employed DDIM sample steps T = 50 for each image and set the classifier-free guidance scale m = 7 in Eq. (8). We split two stages at talign = tearly = 30 for attention alignment in Eq. (3) and latent alignment in Eq. (8). In addition, we employ the shuffle strategy described in Eq. (6) to initialize the starting latent XT G. Experiments run on a single NVIDIA RTX4090 GPU with 8 seconds to generate image variation with batch size 1.
Experiment Setup Yes During the image inversion and generation, we employed DDIM sample steps T = 50 for each image and set the classifier-free guidance scale m = 7 in Eq. (8). We split two stages at talign = tearly = 30 for attention alignment in Eq. (3) and latent alignment in Eq. (8). In addition, we employ the shuffle strategy described in Eq. (6) to initialize the starting latent XT G.