reproducibilityindex.ai

BARET: Balanced Attention Based Real Image Editing Driven by Target-Text Inversion

Authors: Yuming Qiao, Fanyi Wang, Jingwen Su, Yanhao Zhang, Yunjie Yu, Siyu Wu, Guo-Jun Qi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In order to demonstrate editing capability, effectiveness and efficiency of the proposed BARET, we have conducted extensive qualitative and quantitative experiments. Moreover, results derived from user study and ablation study further prove the superiority over other methods.
Researcher Affiliation	Collaboration	Yuming Qiao1,2, Fanyi Wang1*, Jingwen Su1, Yanhao Zhang1, Yunjie Yu1, Siyu Wu3, Guo-Jun Qi1,4 1OPPO Research Institute 2Tsinghua University 3Zhejiang University 4Westlake University
Pseudocode	Yes	Algorithm 1: Target-text inversion
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	To this end, we refer to TEd Bench (Kawar et al. 2023) and collected 100 pairs of real image and textual description for editing.
Dataset Splits	No	The paper mentions collecting 100 pairs for a user study and describes tuning iterations, but does not provide specific train/validation/test dataset splits for model training.
Hardware Specification	Yes	The inversion stage of our method takes only about 16s on a single A100, which greatly improves the editing efficiency compared to methods that require fine-tuning diffusion model such as SINE and Imagic.
Software Dependencies	Yes	All experiments are based on stable diffusion v1.5 (Rombach et al. 2022), implemented DDIM sampling strategy with 50 steps and guidance scale 7.5.
Experiment Setup	Yes	All experiments are based on stable diffusion v1.5 (Rombach et al. 2022), implemented DDIM sampling strategy with 50 steps and guidance scale 7.5. For the target text inversion, loss function of fine-tuning target text embedding is MSE, tuning iterations are 250 in total and 5 iterations per step, and optimizer is Adam (Kingma and Ba 2015). Learning rate is 0.001. Progressive loss value is defined as {t 1e 5}T t=1 in each timestamp for early stop, such that reconstruction quality can be boosted with low threshold of loss at the early stage of denoising. And the threshold is gradually raised in the later stage.