Denoising Diffusion Bridge Models
Authors: Linqi Zhou, Aaron Lou, Samar Khanna, Stefano Ermon
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we apply DDBMs to challenging image datasets in both pixel and latent space. On standard image translation problems, DDBMs achieve significant improvement over baseline methods, and, when we reduce the problem to image generation by setting the source distribution to random noise, DDBMs achieve comparable FID scores to state-of-the-art methods despite being built for a more general task. We evaluate on datasets with different image resolutions to demonstrate its applicability on a variety of scales. For evaluation metrics, we use Fréchet Inception Distance (FID) (Heusel et al., 2017) and Inception Scores (IS) (Barratt and Sharma, 2018) evaluated on all training samples translation quality, and we use LPIPS (Zhang et al., 2018) and MSE (in [ 1, 1] scale) to measure perceptual similarity and translation faithfulness. We now study the effect of our preconditioning and hybrid samplers on generation quality in the context of both VE and VP bridge (see Appendix B for VP bridge parameterization). In the left column of Figure 4, we fix the guidance scale w at 1 and vary the Euler step size s from 0 to 0.9 to introduce stochasticity. We see a significant decrease in FID score as we increase s which produces the best performance at some value between 0 and 1 (e.g., s = 0.3 for Edges Handbags). Table 3: Ablation study on the effect of sampler and preconditioning on FID. |
| Researcher Affiliation | Academia | Linqi Zhou Aaron Lou Samar Khanna Stefano Ermon Department of Computer Science, Stanford University {linqizhou, aaronlou, samar.khanna, ermon}@stanford.edu |
| Pseudocode | Yes | We introduce additional scaling hyperparameter s, which define a step ratio in between ti 1 and ti such that the interval [ti s(ti ti 1), ti] is used for Euler-Maruyama steps and [ti 1, ti s(ti ti 1)] is used for Heun steps, as described in Algorithm 1. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository. |
| Open Datasets | Yes | We choose Edges Handbags (Isola et al., 2017) scaled to 64 64 pixels, which contains image pairs for translating from edge maps to colored handbags, and DIODE-Outdoor (Vasiljevic et al., 2019) scaled to 256 256, which contains normal maps and RGB images of real-world outdoor scenes. We evaluate our method on CIFAR-10 (Krizhevsky et al., 2009) and FFHQ-64 64 (Karras et al., 2019) which are processed according to Karras et al. (2022). |
| Dataset Splits | No | The paper mentions evaluating on "all training samples translation quality" for some metrics and processing datasets "according to Karras et al. (2022)" for others, but it does not explicitly state specific train/validation/test splits, percentages, or absolute counts for any dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Unless noted otherwise, we use the same VE diffusion schedule as in EDM for our bridge model by default. In the left column of Figure 4, we fix the guidance scale w at 1 and vary the Euler step size s from 0 to 0.9 to introduce stochasticity. Diffusion and transport-based methods are evaluated with the same number of function evaluations (N = 40, which is the default for EDM sampler for 64 64 images). |