Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models

Authors: Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to 10 acceleration in mean sampling speed1. We perform in-depth numerical experiments on multiple datasets and on both linear and nonlinear noisy inverse problems. We demonstrate that (1) Flash Diffusion greatly improves the reconstruction performance of the baseline solver, especially in case of degradations with highly varying severity, and (2) Flash-Diffusion accelerates the baseline solver by up to 10 without any loss in reconstruction quality.
Researcher Affiliation Academia Zalan Fabian * 1 Berk Tinaz * 1 Mahdi Soltanolkotabi 1 1Dept. of Electrical and Comp. Eng., Univ. of Southern California, Los Angeles, CA. Correspondence to: Zalan Fabian <zfabian@usc.edu>.
Pseudocode No The paper provides mathematical formulations of algorithms (e.g., DDPM, DPS updates) but does not present them in structured pseudocode blocks or clearly labeled algorithm environments.
Open Source Code Yes 1Code is available at https://github.com/ z-fabian/flash-diffusion
Open Datasets Yes Dataset We perform experiments on Celeb A-HQ (256 256) (Karras et al., 2018) and LSUN Bedrooms (Yu et al., 2015).
Dataset Splits Yes We match the training and validation splits used in (Rombach et al., 2022), and set aside 200 images from the validation split for testing. For experiments on Celeb A-HQ and FFHQ datasets, we use the train and validation splits provided in the Git Hub repo of "Taming Transformers"2. For LSUN Bedrooms experiments, we use the custom split provided in Git Hub repo of "Latent Diffusion"3.
Hardware Specification Yes We use Quadro RTX 5000 and Titan GPUs.
Software Dependencies No The paper mentions using 'Adam optimizer' and implicitly 'torch' for implementations but does not specify versions for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or CUDA.
Experiment Setup Yes Training setup We train severity encoders using Adam optimizer with batch size 28 and learning rate 0.0001 for about 200k steps until the loss on the validation set converges. We use Quadro RTX 5000 and Titan GPUs. Hyperparameters We scale the reconstruction loss terms with their corresponding dimension (d for Llat.rec. and n for Lim.rec.), which we find to be sufficient without tuning for λim.rec.. We tune λσ via grid search on [0.1, 1, 10] on the varying Gaussian blur task and set to 10 for all experiments. For Flash experiments, we tune the noise correction parameter c on the validation subset by grid search over [0.8, 1.0, 1.2] after tuning the remaining baseline method hyperparameters. ... We tune ηDP S > 0 by performing a grid search over [0.1, 0.5, 1.0, 2.0, 3.0, 5.0, 10.0] for all experiments. We provide the optimal hyperparameters in Table 2.