Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
Authors: Nikita Starodubcev, Mikhail Khoroshikh, Artem Babenko, Dmitry Baranchuk
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply i CD to large-scale text-to-image models such Stable Diffusion 1.5 [4] and XL [1] and extensively evaluate them for image editing problems. According to automated and human studies, we confirm that i CD unlocks faithful text-guided image editing for 6 8 steps and is comparable to state-of-the-art text-driven image manipulation methods while being multiple times faster. |
| Researcher Affiliation | Collaboration | Nikita Starodubcev1 2 Mikhail Khoroshikh1 2 Artem Babenko1 Dmitry Baranchuk1 1Yandex Research 2HSE University |
| Pseudocode | No | The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We attach the code in the supplementary material and will release it upon acceptance. ... We plan to release our code and models upon acceptance under the CC-by 4.0 license. |
| Open Datasets | Yes | To evaluate the inversion quality, we consider 5K images and the corresponding prompts from the MS-COCO dataset [56]. ... For SD1.5 distillation, we use a 20M subset of LAION 2B, roughly filtered using CLIP score [64]. |
| Dataset Splits | No | The paper mentions evaluating performance on subsets of data (e.g., '1000 COCO2014 prompts' for Image Reward) but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for the overall model training. It describes the data used for evaluation, not a split for training/validation. |
| Hardware Specification | Yes | The i CD-SD1.5 models are trained for 36h and the i CD-XL ones for 68h on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using specific methods and models (e.g., 'Lo RA adapters with a rank of 64', 'DDIM solver') but does not specify version numbers for general software dependencies, libraries, or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For the SD1.5 model, we use a global batch size of 512, and for the SDXL 128. All models converge relatively fast, requiring about 6K iterations with a learning rate 8e 6. ... The regularization coefficients for the forward and reverse preservation losses are λf=1.5 and λr=1.5. ... We set the hyperparameters of the dynamic CFG to τ = 0.8 and maximum CFG scale to 19.0. |