MaGIC: Multi-modality Guided Image Completion

Authors: Hao Wang, Yongsheng Yu, Tiejian Luo, Heng Fan, Libo Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show the superiority of Ma GIC over state-of-the-art methods and its generalization to various completion tasks.
Researcher Affiliation Academia 1School of Computer Science and Technology, University of Chinese Academy of Sciences 2Institute of Software, Chinese Academy of Sciences 3Department of Computer Science, University of Rochester 4Department of Computer Science and Engineering, University of North Texas
Pseudocode Yes Algorithm 1 Usage of CMB in Ma GIC
Open Source Code No We intend to release our code for condition generation, enabling users to obtain modalities including sketch, pose and segmentation maps effortlessly for image editing purposes.
Open Datasets Yes To verify the proposed Ma GIC, we conduct extensive experiments on various tasks including image inpainting, outpainting, and real user-input editing, using the COCO (Lin et al., 2014), Places2 (Zhou et al., 2018), and in-the-wild data.
Dataset Splits Yes All quantitative experiments are conducted on the COCO (Lin et al., 2014) and Places (Zhou et al., 2018) datasets. Evaluation of the methods involves using the first 1000 images in the COCO validation set and the first 5000 images in the Places validation set.
Hardware Specification Yes All our experiments are conducted using 8 NVIDIA A100-40G GPUs.
Software Dependencies No The paper mentions using the Adam optimizer but does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We set the batch size to 64 and employed the Adam optimizer (Kingma & Ba, 2015) with the learning rate of 1e-5 for training 10 epochs. These settings remain consistent across all conditions. For all diffusion-based methods, the denoising step T is set to 50.