Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving Maps
Authors: Henry Li, Ronen Basri, Yuval Kluger
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our multi-scale likelihood model on a selection of datasets and tasks including density estimation, lossless compression, and out-of-distribution detection and observe significant improvements to the existing state-of-the-art, demonstrating the power behind a multi-scale prior for likelihood modeling. ... We evaluate both the the Laplacian pyramid-based and wavelet-based variants of our proposed probabilistic cascading diffusion model (LP-PCDM and W-PCDM, respectively) in several settings. |
| Researcher Affiliation | Collaboration | 1Yale University, 2Meta AI, 3Weizmann Institute of Science {henry.li, yuval.kluger}@yale.edu ronen.basri@weizmann.ac.il |
| Pseudocode | No | The paper includes mathematical formulations and derivations but does not present any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code can be found at this https url. |
| Open Datasets | Yes | First, we begin on a general density estimation task on the CIFAR10 (Krizhevsky et al., 2009) and Image Net 32, 64, and 128 (Van Den Oord et al., 2016) datasets. |
| Dataset Splits | No | The paper mentions training and testing on datasets like CIFAR10 and ImageNet, and refers to a 'test set', but it does not explicitly specify the training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits for reproducibility). |
| Hardware Specification | Yes | All training is performed on 8x NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions specific software components like 'Adam W' and refers to prior work for architectural details ('VDM U-Net implementation in (Kingma et al., 2021)'), but it does not provide specific version numbers for any programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | We construct our cascaded diffusion models with antithetic time sampling and a learnable noise schedule as in (Kingma et al., 2021). ... For CIFAR10, we use two scales... We use a U-Net of depth 32, consisting of 32 residual blocks in the forward and reverse directions, respectively. ... We train with Adam W for 2 million updates. |