ParallelEdits: Efficient Multi-Aspect Text-Driven Image Editing with Attention Grouping
Authors: Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Lokhande, Siwei Lyu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments Table 1: Comparison results in multi-aspect image editing on the PIE-Bench++ dataset. |
| Researcher Affiliation | Academia | Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande , Siwei Lyu University at Buffalo, State University of New York, USA |
| Pseudocode | Yes | A Parallel Edits: The Algorithm In this section we provide Algorithm 1: Early Aspect Grouping and Algorithm 2: Parallel Edits on a particular branch. |
| Open Source Code | Yes | Codes are available at: https://mingzhenhuang.github.io/projects/Parallel Edits.html. The code and data will be open-sourced for academic use. |
| Open Datasets | Yes | Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. Codes are available at: https://mingzhenhuang.github.io/projects/Parallel Edits.html. The code and data will be open-sourced for academic use. |
| Dataset Splits | No | The paper mentions "PIE-Bench++ dataset" which is used for evaluation but does not specify details of train/validation/test splits (e.g., percentages or counts) within the dataset for model training and evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | Yes | Our proposed Parallel Edits is based on the Latent Consistency Model [32], with the publicly available LCM which is finetuned from Stable Diffusion v1.5. |
| Experiment Setup | Yes | During sampling, we perform LCM sampling [32] with 15 denoising steps, and the classifier-free guidance (CFG) is set to 4.0. Parallel Edits can control the editing strength by adjusting the CFG . There s a trade-off between achieving satisfactory inversion and robust editing ability. A higher CFG tends to produce stronger editing effects but may lower inversion results and identity preservation. We also set the hyper-parameter θ as 0.9 and β as 0.8 in our experiments, where θ, β are used to determine the edit type of a given edit action. |