Gradient-free Decoder Inversion in Latent Diffusion Models

Authors: Seongmin Hong, Suh Yoon Jeon, Kyeonghyun Lee, Ernest Ryu, Se Young Chun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we propose an efficient gradient-free decoder inversion for LDMs, which can be applied to diverse latent models. Theoretical convergence property of our proposed inversion has been investigated not only for the forward step method, but also for the inertial Krasnoselskii-Mann (KM) iterations under mild assumption on cocoercivity that is satisfied by recent LDMs. Our proposed gradient-free method with Adam optimizer and learning rate scheduling significantly reduced computation time and memory usage over prior gradient-based methods and enabled efficient computation in applications such as noise-space watermarking and background-preserving image editing while achieving comparable error levels.
Researcher Affiliation Academia Seongmin Hong1 Suh Yoon Jeon1 Kyeonghyun Lee1 Ernest K. Ryu2, Se Young Chun1,3, 1Dept. of Electrical and Computer Engineering, 3INMC & IPAI, Seoul National University 2Dept. of Mathematics, University of California, Los Angeles
Pseudocode No No explicit pseudocode or algorithm blocks found.
Open Source Code Yes Project page: https://smhongok.github.io/dec-inv.html ... we used the code from Hong et al. [14]’s official repository2 to perform decoder inversion ... 2https://github.com/smhongok/inv-DM
Open Datasets Yes Stable Diffusion 2.1 [37] is a widely known open-source text-to-image generation model. La Vie [47] is a text-conditioned video generation model, which can generate consecutive frames per inference. Insta Flow [24] is a one-step text-to-image generation model that can generate images whose quality is as good as Stable Diffusion. These three LDMs will be used in all the experiments of our work. ... For Stable Diffusion 2.1 and Insta Flow, we used prompts from https://huggingface.co/datasets/Gustavosta/Stable-Diffusion-Prompts.
Dataset Splits No The paper discusses LDMs which are already trained models. The experiments evaluate the performance of the proposed inversion method, not train a model on a dataset with splits. Therefore, no train/validation/test splits are provided for the inversion method itself.
Hardware Specification Yes For running Stable Diffusion 2.1, one NVIDIA Ge Force RTX 3090 Ti was used. The RAM size of the GPU was 24 GB. Note that most of the computation was conducted on GPU. For CPU, one 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz was used. For running La Vie, one NVIDIA A100 SXM4 80GB and AMD EPYC 7742 64-Core Processor were used. For Insta Flow, one NVIDIA Ge Force RTX 3090 GPU with one 12th Gen Intel(R) Core(TM) 17-12700K @ 3.60GHz was used.
Software Dependencies No Adam: A method for stochastic optimization. In ICLR, 2015. ... The paper mentions the Adam optimizer [16] as a method but does not provide specific version numbers for software libraries, programming languages, or other dependencies required to replicate the experimental environment.
Experiment Setup Yes With or without momentum, the learning rate ρ was fixed at 0.001. For inertial KM iterations, α was set to 0.9. ... In [14], the learning rate was 0.1 with 100 iterations, but it showed long runtimes. Thus, we set the iterations to 20, 30, 50, and 100. ... The classifier-free guidance was 3.0 in Stable Diffusion 2.1, 7.5 in La Vie, and 1.0 in Insta Flow. ... 16-bit precision and 50 iterations (as Adam converges faster than others).