ParallelEdits: Efficient Multi-Aspect Text-Driven Image Editing with Attention Grouping

Authors: Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Lokhande, Siwei Lyu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments Table 1: Comparison results in multi-aspect image editing on the PIE-Bench++ dataset.
Researcher Affiliation Academia Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande , Siwei Lyu University at Buffalo, State University of New York, USA
Pseudocode Yes A Parallel Edits: The Algorithm In this section we provide Algorithm 1: Early Aspect Grouping and Algorithm 2: Parallel Edits on a particular branch.
Open Source Code Yes Codes are available at: https://mingzhenhuang.github.io/projects/Parallel Edits.html. The code and data will be open-sourced for academic use.
Open Datasets Yes Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. Codes are available at: https://mingzhenhuang.github.io/projects/Parallel Edits.html. The code and data will be open-sourced for academic use.
Dataset Splits No The paper mentions "PIE-Bench++ dataset" which is used for evaluation but does not specify details of train/validation/test splits (e.g., percentages or counts) within the dataset for model training and evaluation.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies Yes Our proposed Parallel Edits is based on the Latent Consistency Model [32], with the publicly available LCM which is finetuned from Stable Diffusion v1.5.
Experiment Setup Yes During sampling, we perform LCM sampling [32] with 15 denoising steps, and the classifier-free guidance (CFG) is set to 4.0. Parallel Edits can control the editing strength by adjusting the CFG . There s a trade-off between achieving satisfactory inversion and robust editing ability. A higher CFG tends to produce stronger editing effects but may lower inversion results and identity preservation. We also set the hyper-parameter θ as 0.9 and β as 0.8 in our experiments, where θ, β are used to determine the edit type of a given edit action.