Eliminating the Cross-Domain Misalignment in Text-guided Image Inpainting

Authors: Muqi Huang, Chaoyue Wang, Yong Luo, Lefei Zhang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show exceptional performance on leading datasets such as MS-COCO and Open Images, surpassing state-of-the-art text-guided image inpainting methods.
Researcher Affiliation Collaboration Muqi Huang1 , Chaoyue Wang2 , Yong Luo1 and Lefei Zhang1,3 1Institute of Artificial Intelligence, School of Computer Science, Wuhan University 2JD Explore Academy 3Hubei Luojia Laboratory
Pseudocode No The paper describes the method and architecture but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is released at: https://github.com/MucciH/ECDM-inpainting.
Open Datasets Yes We fine-tune our model on the standard MS-COCO dataset [Lin et al., 2014], which comprises over 100k images in the training set. For testing, we utilize 5k imagetext pairs from the MS-COCO validation set. To assess the robustness of our model to diverse data, we further validate its performance on 1.5k images from the Open Images dataset [Kuznetsova et al., 2020].
Dataset Splits Yes For testing, we utilize 5k imagetext pairs from the MS-COCO validation set.
Hardware Specification Yes Each experiment necessitate the utilization of one A100 GPU.
Software Dependencies Yes We employ our proposed Structure-Aware Inpainting Learning (SAIL) approach for image inpainting under the architecture of Control Net and it is finetuned from Controlnet v1.1 In Paint Version.
Experiment Setup Yes The learning rate is set at 5e-5, and the batch size is configured to be 4.