Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the proposed method outperforms previous works in reconstruction and consistent editing, and produces impressive results in various settings. We first quantitatively evaluate the reconstruction quality of different inversion-based methods on 200 randomly selected images from the MS-COCO validation set. We measure the reconstruction quality by Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), and efficiency by reconstruction time (Time). As provided in Table 1, the reconstruction quality of our method is significantly superior to DDIM reconstruction, attaining a level of reconstruction that is comparable to VAE, which serves as an upper bound for reconstruction. |
| Researcher Affiliation | Collaboration | 1School of Automation Science and Electrical Engineering, Beihang University, China 2Meituan 3Hangzhou Research Institute, Beihang University, China 4Zhongguancun Laboratory, Beijing, China 5Nanchang Institute of Technology, Nanchang, China |
| Pseudocode | Yes | Algorithm 1: Tuning-free Inversion-enhanced Control (TIC) for Consistent Image Editing |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | For the dataset, we evaluate the reconstruction quality of VAE, DDIM, NTI, PTI and our method on 200 randomly selected images from the MS-COCO 2017 validation set (Lin et al. 2014). |
| Dataset Splits | No | The paper mentions using 200 randomly selected images from the MS-COCO 2017 validation set for evaluation, but does not provide specific details on training, validation, or test splits for any model training or fine-tuning conducted by the authors. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU/CPU models or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Stable Diffusion v1.4' and classifier-free guidance settings, but does not provide specific versions of ancillary software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions) needed for replication. |
| Experiment Setup | Yes | For the DDIM schedule, we perform both inversion and sampling for 50 steps, and retain the original hyperparameter choices of Stable Diffusion. The classifier-free guidance (CFG) scale is set to 7.5 for editing. The step and layer to start TIC is set to t0 =4 and l0 =10, respectively. |