IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

Authors: Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate that our IMPUS can achieve smooth, direct, and realistic image morphing and is adaptable to several other generative tasks.
Researcher Affiliation Collaboration GE Research1 Australian National University2
Pseudocode Yes Algorithm 1 Finetuning & inference process of IMPUS
Open Source Code Yes Code is available at: https://github.com/Go L2022/IMPUS
Open Datasets Yes The data used comes from three main sources: 1) benchmark datasets for image generation, including Faces (50 pairs of random images for each subset of images from Celeb A-HQ (Karras et al., 2018)), Animals (50 pairs of random images for each subset of images from AFHQ (Choi et al., 2020), including dog, cat, dog-cat, and wild), and Outdoors (50 pairs of church images from LSUN (Yu et al., 2015)), 2) internet images, e.g., the flower and beetle car examples, and 3) 25 image pairs from Wang & Golland (2023).
Dataset Splits No The data used comes from three main sources: 1) benchmark datasets for image generation, including Faces (50 pairs of random images for each subset of images from Celeb A-HQ (Karras et al., 2018)), Animals (50 pairs of random images for each subset of images from AFHQ (Choi et al., 2020), including dog, cat, dog-cat, and wild), and Outdoors (50 pairs of church images from LSUN (Yu et al., 2015)).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper.
Software Dependencies No For all the experiments, we use use a latent space diffusion model (Rombach et al., 2022), with pre-trained weights from Stable-Diffusion-v-1-4 1. Textual inversion is trained with Adam W optimizer (Loshchilov & Hutter, 2019), and the learning rate is set as 0.002 for 2500 steps. For the benchmark dataset, we perform text inversion for 1000 steps. Lo RA is trained with Adam optimizer (Kingma & Ba, 2015), and the learning rate is set as 0.001.
Experiment Setup Yes We set the Lo RA rank for unconditional score estimates to 2, and the default Lo RA rank for conditional score estimates is set to be heuristic (auto). The conditional parts and unconditional parts are finetuned for 150 steps and 15 steps respectively. The finetune learning rate is set to 10 3. Hyperparameters for text inversion as well as guidance scales vary based on the dataset.