reproducibilityindex.ai

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks

Authors: Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate Inst Diff Edit, we also conduct extensive experiments on Image Net and Imagen, and compare it with a bunch of the SOTA methods. The experimental results show that Inst Diff Edit not only outperforms the SOTA methods in both image quality and editing results, but also has a much faster inference speed, i.e., +5 to +6 times.
Researcher Affiliation	Collaboration	1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China 2 Fuxi AI Lab, Net Ease Inc., Hangzhou, China
Pseudocode	No	Insufficient information. The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code available at https://github.com/xiaotianqing/Inst Diff Edit
Open Datasets	Yes	We use Image Net, Imagen and Editing-Mask to evaluate the performance of semantic editing task. Image Net Followed the evaluation of Flexit (Couairon et al. 2022a). A total of 1092 images in Image Net (Deng et al. 2009) are included, covering 273 categories. Imagen We construct an evaluation dataset for semantic editing by utilizing the generations from the Imagen (Saharia et al. 2022) model.
Dataset Splits	No	Insufficient information. The paper mentions using Image Net and Imagen datasets but does not explicitly provide the training/validation/test splits (e.g., percentages or sample counts) used for these experiments within the text.
Hardware Specification	Yes	All experiment are conducted on a NVIDIA A100.
Software Dependencies	Yes	The framework of Inst Diff Edit is based on Stable Diffusion v1.4.
Experiment Setup	Yes	We use 50 steps of LDMScheduler sampler with a scale 7.5, and set noise strength to r = 0.5, threshold of binarization to φ = 0.2, and the thresholds for attention refinement defined in Eq. 8 are 0.9 and 0.6 by default, respectively.