Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

Authors: Nikita Starodubcev, Mikhail Khoroshikh, Artem Babenko, Dmitry Baranchuk

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply i CD to large-scale text-to-image models such Stable Diffusion 1.5 [4] and XL [1] and extensively evaluate them for image editing problems. According to automated and human studies, we confirm that i CD unlocks faithful text-guided image editing for 6 8 steps and is comparable to state-of-the-art text-driven image manipulation methods while being multiple times faster.
Researcher Affiliation Collaboration Nikita Starodubcev1 2 Mikhail Khoroshikh1 2 Artem Babenko1 Dmitry Baranchuk1 1Yandex Research 2HSE University
Pseudocode No The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We attach the code in the supplementary material and will release it upon acceptance. ... We plan to release our code and models upon acceptance under the CC-by 4.0 license.
Open Datasets Yes To evaluate the inversion quality, we consider 5K images and the corresponding prompts from the MS-COCO dataset [56]. ... For SD1.5 distillation, we use a 20M subset of LAION 2B, roughly filtered using CLIP score [64].
Dataset Splits No The paper mentions evaluating performance on subsets of data (e.g., '1000 COCO2014 prompts' for Image Reward) but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for the overall model training. It describes the data used for evaluation, not a split for training/validation.
Hardware Specification Yes The i CD-SD1.5 models are trained for 36h and the i CD-XL ones for 68h on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using specific methods and models (e.g., 'Lo RA adapters with a rank of 64', 'DDIM solver') but does not specify version numbers for general software dependencies, libraries, or programming languages used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For the SD1.5 model, we use a global batch size of 512, and for the SDXL 128. All models converge relatively fast, requiring about 6K iterations with a learning rate 8e 6. ... The regularization coefficients for the forward and reverse preservation losses are λf=1.5 and λr=1.5. ... We set the hyperparameters of the dynamic CFG to τ = 0.8 and maximum CFG scale to 19.0.