Understanding DDPM Latent Codes Through Optimal Transport
Authors: Valentin Khrulkov, Gleb Ryzhakov, Andrei Chertkov, Ivan Oseledets
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support this hypothesis by extensive numerical experiments using advanced tensor train solver for multidimensional Fokker-Planck equation. We study the DDPM encoder map by numerically solving the Fokker-Planck equation on a large class of synthetic distributions and show that the equality holds up to negligible errors. We provide additional qualitative empirical evidence supporting our hypothesis on real image datasets. |
| Researcher Affiliation | Collaboration | Valentin Khrulkov Yandex Moscow, Russia khrulkov.v@gmail.com Gleb Ryzhakov & Andrei Chertkov Skolkovo Institute of Science and Technology Moscow, Russia {a.chertkov,g.ryzhakov}@skoltech.ru Ivan Oseledets Skolkovo Institute of Science and Technology and AIRI Moscow, Russia i.oseledets@skoltech.ru |
| Pseudocode | No | The paper describes its methods in narrative text and does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code is available in the supplementary material. We use the official implementation available at github 1. |
| Open Datasets | Yes | Datasets. We consider the AFHQ animal dataset (Choi et al., 2020). It consists of 15000 images split into 3 categories: cat, dog, and wild. This is a common benchmark for image-to-image methods. We also verify our theory on the FFHQ dataset of 70.000 human faces (Karras et al., 2019) and the Met Faces dataset (Karras et al., 2020) consisting of 1000 human portraits. Finally, we consider a conditional DDPM on the Image Net dataset (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions training steps for models and using a 'validation subset' of the AFHQ dataset for numerical results, but it does not specify explicit percentages or sample counts for training, validation, and test splits needed to reproduce data partitioning. |
| Hardware Specification | No | The paper states that 'these calculations were carried out on a regular laptop' but does not provide specific details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using the 'Python Optimal Transport (POT) library' and 'Runge-Kutta method' but does not specify exact version numbers for these software dependencies or the Python interpreter. |
| Experiment Setup | Yes | For each of the datasets we train a separate DDPM model with the same config as utilized for the LSUN datasets in Dhariwal & Nichol (2021) (with dropout); we use default 1000 timesteps for sampling. The AFHQ models were trained for 3 105 steps, the FFHQ model was trained for 106 steps; for the Met Faces model, we finetune the FFHQ checkpoint for 25 103 steps similar to Choi et al. (2021). All models were trained on 256 256 resolution. |