Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference
Authors: Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, Bin Cui
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive empirical results show that FISEdit can be 3.4 and 4.4 faster than existing methods on NVIDIA TITAN RTX and A100 GPUs respectively, and even generates more satisfactory images. |
| Researcher Affiliation | Academia | Zihao Yu1, Haoyang Li1, Fangcheng Fu1, Xupeng Miao2, Bin Cui1,3 1 School of Computer Science & Key Lab of High Confidence Software Technologies (MOE), Peking University 2 Carnegie Mellon University 3 Institute of Computational Social Science, Peking University (Qingdao), China |
| Pseudocode | No | The paper describes the methods in text and figures, but no structured pseudocode or algorithm blocks are explicitly presented. |
| Open Source Code | Yes | We implement our system based on the Hugging Face s diffusers1, which is a generic framework for training and inference of diffusion models. We clone this project and integrate it with our self-developed sparse inference engine Hetu2 (Miao et al. 2022c,a,b)... and more details about our evaluation configurations can be found in our repository4. 4https://github.com/Hankpipi/diffusers-hetu |
| Open Datasets | Yes | We select LAION-Aesthetics (Schuhmann et al. 2022) as the evaluation dataset... The processed dataset5 consists of 454,445 examples... 5http://instruct-pix2pix.eecs.berkeley.edu/ |
| Dataset Splits | No | The paper mentions using LAION-Aesthetics dataset and its total size, but does not specify the explicit percentages or counts for training, validation, or test splits. |
| Hardware Specification | Yes | Eventually, we accelerate text-to-image inference by up to 4.4 on NVIDIA TITAN RTX and 3.4 on NVIDIA A100 when the edit size is 5%. |
| Software Dependencies | No | The paper mentions 'Hugging Face s diffusers' and 'Hetu' as software components but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We vary Instruct Pix2Pix s image guidance scale between [1.0, 2.5], SDEdit s strength between [0.5, 0.75], DIFFEdit s strength between [0.5, 1.0], and edited size between [0.25, 0.75] for SDIP and our method. |