Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference

Authors: Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, Bin Cui

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive empirical results show that FISEdit can be 3.4 and 4.4 faster than existing methods on NVIDIA TITAN RTX and A100 GPUs respectively, and even generates more satisfactory images.
Researcher Affiliation Academia Zihao Yu1, Haoyang Li1, Fangcheng Fu1, Xupeng Miao2, Bin Cui1,3 1 School of Computer Science & Key Lab of High Confidence Software Technologies (MOE), Peking University 2 Carnegie Mellon University 3 Institute of Computational Social Science, Peking University (Qingdao), China
Pseudocode No The paper describes the methods in text and figures, but no structured pseudocode or algorithm blocks are explicitly presented.
Open Source Code Yes We implement our system based on the Hugging Face s diffusers1, which is a generic framework for training and inference of diffusion models. We clone this project and integrate it with our self-developed sparse inference engine Hetu2 (Miao et al. 2022c,a,b)... and more details about our evaluation configurations can be found in our repository4. 4https://github.com/Hankpipi/diffusers-hetu
Open Datasets Yes We select LAION-Aesthetics (Schuhmann et al. 2022) as the evaluation dataset... The processed dataset5 consists of 454,445 examples... 5http://instruct-pix2pix.eecs.berkeley.edu/
Dataset Splits No The paper mentions using LAION-Aesthetics dataset and its total size, but does not specify the explicit percentages or counts for training, validation, or test splits.
Hardware Specification Yes Eventually, we accelerate text-to-image inference by up to 4.4 on NVIDIA TITAN RTX and 3.4 on NVIDIA A100 when the edit size is 5%.
Software Dependencies No The paper mentions 'Hugging Face s diffusers' and 'Hetu' as software components but does not provide specific version numbers for them.
Experiment Setup Yes We vary Instruct Pix2Pix s image guidance scale between [1.0, 2.5], SDEdit s strength between [0.5, 0.75], DIFFEdit s strength between [0.5, 1.0], and edited size between [0.25, 0.75] for SDIP and our method.