Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
Authors: Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate Inst Diff Edit, we also conduct extensive experiments on Image Net and Imagen, and compare it with a bunch of the SOTA methods. The experimental results show that Inst Diff Edit not only outperforms the SOTA methods in both image quality and editing results, but also has a much faster inference speed, i.e., +5 to +6 times. |
| Researcher Affiliation | Collaboration | 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China 2 Fuxi AI Lab, Net Ease Inc., Hangzhou, China |
| Pseudocode | No | Insufficient information. The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code available at https://github.com/xiaotianqing/Inst Diff Edit |
| Open Datasets | Yes | We use Image Net, Imagen and Editing-Mask to evaluate the performance of semantic editing task. Image Net Followed the evaluation of Flexit (Couairon et al. 2022a). A total of 1092 images in Image Net (Deng et al. 2009) are included, covering 273 categories. Imagen We construct an evaluation dataset for semantic editing by utilizing the generations from the Imagen (Saharia et al. 2022) model. |
| Dataset Splits | No | Insufficient information. The paper mentions using Image Net and Imagen datasets but does not explicitly provide the training/validation/test splits (e.g., percentages or sample counts) used for these experiments within the text. |
| Hardware Specification | Yes | All experiment are conducted on a NVIDIA A100. |
| Software Dependencies | Yes | The framework of Inst Diff Edit is based on Stable Diffusion v1.4. |
| Experiment Setup | Yes | We use 50 steps of LDMScheduler sampler with a scale 7.5, and set noise strength to r = 0.5, threshold of binarization to φ = 0.2, and the thresholds for attention refinement defined in Eq. 8 are 0.9 and 0.6 by default, respectively. |