Neural Diffusion Models

Authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs through experiments on many image generation benchmarks, including MNIST, CIFAR-10, downsampled versions of Image Net and Celeb A-HQ. NDMs outperform conventional diffusion models in terms of likelihood, achieving state-of-the-art results on Image Net and Celeb AHQ, and produces high-quality samples.
Researcher Affiliation Academia 1University of Amsterdam 2Constructor University, Bremen. Correspondence to: Grigory Bartosh <g.bartosh@uva.nl>, Dmitry Vetrov <dvetrov@constructor.university>, Christian A. Naesseth <c.a.naesseth@uva.nl>.
Pseudocode Yes Algorithm 1 Learning NDM Algorithm 2 Sampling from NDM
Open Source Code No The paper does not provide a specific repository link, explicit code release statement, or indicate that code is in supplementary materials.
Open Datasets Yes We demonstrate NDMs with learnable transformations on the MNIST (Deng, 2012), CIFAR-10 (Krizhevsky et al., 2009), downsampled Image Net (Deng et al., 2009; Van Den Oord et al., 2016) and Celeb A-HQ-256 (Karras et al., 2017) datasets.
Dataset Splits No The paper mentions using test data but does not explicitly provide information on validation dataset splits, only stating that NLL and NELBO are calculated on test data.
Hardware Specification Yes The training was performed using Tesla V100 GPUs.
Software Dependencies No The paper mentions using the RK45 solver and U-Net architecture but does not specify version numbers for any software dependencies.
Experiment Setup Yes The hyper-parameters are presented in Table 6. In all experiments we use same neural network architectures to parameterize both the generative process and the transformations Fφ. To facilitate the training process, we employed a polynomial decay learning rate schedule, which includes a warm-up phase for a specified number of training steps. During the warm-up phase, the learning rate is linearly increased from 10 8 to the peak learning rate. Once the peak learning rate is reached, the learning rate is linearly decayed to 10 8 until the final training step.