Diffusion Normalizing Flow

Authors: Qinsheng Zhang, Yongxin Chen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithm demonstrates competitive performance in both high-dimension data density estimation and image generation tasks. 4 Experiments We evaluate the performance of Diff Flow in sample quality and likelihood on test data.
Researcher Affiliation Academia Qinsheng Zhang Georgia Institute of Technology qzhang419@gatech.edu Yongxin Chen Georgia Institute of Technology yongchen@gatech.edu
Pseudocode Yes Algorithm 1 Training, Algorithm 2 Stochastic Adjoint Algorithm for Diff Flow
Open Source Code Yes We also include the Py Torch [34] implementation in the supplemental material.
Open Datasets Yes We perform density estimation experiments on five tabular datasets [33]. In this section, we report the quantitative comparison and qualitative performance of our method and existing methods on common image datasets, MNIST [24] and CIFAR-10 [23].
Dataset Splits No The paper mentions using training data for models but does not specify the explicit percentages or counts for training, validation, or test splits. It refers to established datasets like MNIST and CIFAR-10 which have standard splits, but these specific splits are not detailed in the paper's text.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies No The paper mentions 'Py Torch [34]' but does not provide a specific version number for it or any other software dependencies.
Experiment Setup Yes We use the same unconstrained U-net style model as used successfully by Ho et al. [17] for drift and score network. We reduce the network size to half of the original DDPM network so that the total number of trainable parameters of Diff Flow and DDPM are comparable. We use small N = 10 at the beginning of training and slowly increase to large N as training proceeds. The schedule of N reduces the training time greatly compared with using large N all the time. We use constants gi = 1 and T = 0.05 for MNIST and CIFAR10, and N = 30 for sampling MNIST data and N = 100 for sampling CIFAR10. Empirically, we found β = 0.9 works well.