Real-World Image Variation by Aligning Diffusion Inversion Chain
Authors: Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that our proposed approach outperforms existing methods concerning semantic similarity and perceptual quality. |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2Smart More |
| Pseudocode | No | The paper describes its methods using equations and prose but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://rival-diff.github.io |
| Open Datasets | Yes | Our study obtained a high-quality test set of reference images from the Internet and Dream Booth [12] to ensure a diverse image dataset. |
| Dataset Splits | No | The paper mentions using a "test set" and "evaluation samples are from two datasets" but does not provide specific details on training, validation, or test dataset splits or percentages. |
| Hardware Specification | Yes | Experiments run on a single NVIDIA RTX4090 GPU with 8 seconds to generate image variation with batch size 1. |
| Software Dependencies | Yes | Our baseline model is Stable-Diffusion V1.5. During the image inversion and generation, we employed DDIM sample steps T = 50 for each image and set the classifier-free guidance scale m = 7 in Eq. (8). We split two stages at talign = tearly = 30 for attention alignment in Eq. (3) and latent alignment in Eq. (8). In addition, we employ the shuffle strategy described in Eq. (6) to initialize the starting latent XT G. Experiments run on a single NVIDIA RTX4090 GPU with 8 seconds to generate image variation with batch size 1. |
| Experiment Setup | Yes | During the image inversion and generation, we employed DDIM sample steps T = 50 for each image and set the classifier-free guidance scale m = 7 in Eq. (8). We split two stages at talign = tearly = 30 for attention alignment in Eq. (3) and latent alignment in Eq. (8). In addition, we employ the shuffle strategy described in Eq. (6) to initialize the starting latent XT G. |