IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models
Authors: Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that our IMPUS can achieve smooth, direct, and realistic image morphing and is adaptable to several other generative tasks. |
| Researcher Affiliation | Collaboration | GE Research1 Australian National University2 |
| Pseudocode | Yes | Algorithm 1 Finetuning & inference process of IMPUS |
| Open Source Code | Yes | Code is available at: https://github.com/Go L2022/IMPUS |
| Open Datasets | Yes | The data used comes from three main sources: 1) benchmark datasets for image generation, including Faces (50 pairs of random images for each subset of images from Celeb A-HQ (Karras et al., 2018)), Animals (50 pairs of random images for each subset of images from AFHQ (Choi et al., 2020), including dog, cat, dog-cat, and wild), and Outdoors (50 pairs of church images from LSUN (Yu et al., 2015)), 2) internet images, e.g., the flower and beetle car examples, and 3) 25 image pairs from Wang & Golland (2023). |
| Dataset Splits | No | The data used comes from three main sources: 1) benchmark datasets for image generation, including Faces (50 pairs of random images for each subset of images from Celeb A-HQ (Karras et al., 2018)), Animals (50 pairs of random images for each subset of images from AFHQ (Choi et al., 2020), including dog, cat, dog-cat, and wild), and Outdoors (50 pairs of church images from LSUN (Yu et al., 2015)). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper. |
| Software Dependencies | No | For all the experiments, we use use a latent space diffusion model (Rombach et al., 2022), with pre-trained weights from Stable-Diffusion-v-1-4 1. Textual inversion is trained with Adam W optimizer (Loshchilov & Hutter, 2019), and the learning rate is set as 0.002 for 2500 steps. For the benchmark dataset, we perform text inversion for 1000 steps. Lo RA is trained with Adam optimizer (Kingma & Ba, 2015), and the learning rate is set as 0.001. |
| Experiment Setup | Yes | We set the Lo RA rank for unconditional score estimates to 2, and the default Lo RA rank for conditional score estimates is set to be heuristic (auto). The conditional parts and unconditional parts are finetuned for 150 steps and 15 steps respectively. The finetune learning rate is set to 10 3. Hyperparameters for text inversion as well as guidance scales vary based on the dataset. |