Image Inpainting via Tractable Steering of Diffusion Models
Authors: Anji Liu, Mathias Niepert, Guy Van den Broeck
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results suggest that our approach can consistently improve the overall quality and semantic coherence of inpainted images across three natural image datasets (i.e., Celeb A-HQ, Image Net, and LSUN) with only 10% additional computational overhead brought by the TPM. Empirical evaluations on three challenging high-resolution natural image datasets (i.e., Celeb A-HQ, Image Net, and LSUN) show that the proposed method Tiramisu consistently improves the overall quality of inpainted images while introducing only 10% additional computational overhead, which is the joint effort of (i) further scaling up PC models based on prior art, and (ii) an improved custom GPU implementation for PC training and inference. |
| Researcher Affiliation | Academia | Anji Liu1, Mathias Niepert2, Guy Van den Broeck1 1Department of Computer Science, University of California, Los Angeles 2Department of Computer Science, University of Stuttgart |
| Pseudocode | Yes | To facilitate reproducibility, we have clearly stated the proposed PC inference algorithm in Section 4.2. Specifically, Section 4.2 describes 'The forward pass' and 'The backward pass' with clear algorithmic steps and formulas, which constitutes a structured algorithm description. |
| Open Source Code | Yes | Code is available at https://github.com/UCLA-Star AI/Tiramisu. We provide the code to train the PCs and to generate inpainted images at https://github.com/UCLA-Star AI/ Tiramisu. |
| Open Datasets | Yes | Quantitative and qualitative results Table 1 shows the average LPIPS values (Zhang et al., 2018) on three datasets: Celeb A-HQ (Liu et al., 2015), Image Net (Deng et al., 2009), and LSUN-Bedroom (Yu et al., 2015). Pretrained Models For all inpainting algorithms, we adopt the same diffusion model checkpoint pretrained by Lugmayr et al. (2022) (for Celeb A-HQ) and Open AI (for Image Net and LSUN-Bedroom; https://github.com/openai/guided-diffusion). The links to the checkpoints for all three datasets are listed below. Celeb A-HQ: https://drive.google.com/uc?id=1nor NWWGYP3EZ_ o05Dmo W1ry Ku KMmhl CX Image Net: https://openaipublic.blob.core.windows.net/diffusion/ jul-2021/256x256_diffusion.pt and https://openaipublic.blob.core. windows.net/diffusion/jul-2021/256x256_classifier.pt. LSUN-Bedroom: https://openaipublic.blob.core.windows.net/diffusion/ jul-2021/lsun_bedroom.pt. |
| Dataset Splits | Yes | For Celeb A-HQ and LSUN-Bedroom, we use the first 100 images in the validation set. We adopt the validation split of Celeb A-HQ following Suvorov et al. (2022). For Image Net, we use a random validation image for the first 100 classes. In all experiments, we used the first three samples in the validation set to tune the mixing hyperparameters. |
| Hardware Specification | No | The paper discusses computational overhead and runtime, but it does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions deep learning models like VQ-GAN and diffusion models and links to pretrained model checkpoints (e.g., from OpenAI), but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x). |
| Experiment Setup | Yes | In all experiments, we compute wz i( zi 0) by first drawing 4 samples from 1 j wj( xj 0), and then feed them to the VQ-GAN s encoder. For every sample, we get a distribution over variable Zi 0. wz i( zi 0) is then computed as the mean of the four distributions. In the following decoding phase that computes p TPM( x0|xt, xk 0) := E z0 p TPM( |xt,xk 0)[p( x0| z0)], we draw 8 random samples from p TPM( |xt, xk 0) to estimate p TPM( x0|xt, xk 0). Table 3: Mixing hyperparameters of Tiramisu. a 0.8 b 1.0 λ 2.0 tcut 200. In all settings, we have T = 250. Table 4: Hyperparameters of EM fine-tuning process. Step size 1.0 Batch size 20000 Pseudocount 0.1 # iterations 200. |