Likelihood Training of Cascaded Diffusion Models via Hierarchical Volume-preserving Maps

Authors: Henry Li, Ronen Basri, Yuval Kluger

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our multi-scale likelihood model on a selection of datasets and tasks including density estimation, lossless compression, and out-of-distribution detection and observe significant improvements to the existing state-of-the-art, demonstrating the power behind a multi-scale prior for likelihood modeling. ... We evaluate both the the Laplacian pyramid-based and wavelet-based variants of our proposed probabilistic cascading diffusion model (LP-PCDM and W-PCDM, respectively) in several settings.
Researcher Affiliation Collaboration 1Yale University, 2Meta AI, 3Weizmann Institute of Science {henry.li, yuval.kluger}@yale.edu ronen.basri@weizmann.ac.il
Pseudocode No The paper includes mathematical formulations and derivations but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code can be found at this https url.
Open Datasets Yes First, we begin on a general density estimation task on the CIFAR10 (Krizhevsky et al., 2009) and Image Net 32, 64, and 128 (Van Den Oord et al., 2016) datasets.
Dataset Splits No The paper mentions training and testing on datasets like CIFAR10 and ImageNet, and refers to a 'test set', but it does not explicitly specify the training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits for reproducibility).
Hardware Specification Yes All training is performed on 8x NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions specific software components like 'Adam W' and refers to prior work for architectural details ('VDM U-Net implementation in (Kingma et al., 2021)'), but it does not provide specific version numbers for any programming languages, libraries, or other software dependencies.
Experiment Setup Yes We construct our cascaded diffusion models with antithetic time sampling and a learnable noise schedule as in (Kingma et al., 2021). ... For CIFAR10, we use two scales... We use a U-Net of depth 32, consisting of 32 residual blocks in the forward and reverse directions, respectively. ... We train with Adam W for 2 million updates.