Neural Diffusion Models
Authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs through experiments on many image generation benchmarks, including MNIST, CIFAR-10, downsampled versions of Image Net and Celeb A-HQ. NDMs outperform conventional diffusion models in terms of likelihood, achieving state-of-the-art results on Image Net and Celeb AHQ, and produces high-quality samples. |
| Researcher Affiliation | Academia | 1University of Amsterdam 2Constructor University, Bremen. Correspondence to: Grigory Bartosh <g.bartosh@uva.nl>, Dmitry Vetrov <dvetrov@constructor.university>, Christian A. Naesseth <c.a.naesseth@uva.nl>. |
| Pseudocode | Yes | Algorithm 1 Learning NDM Algorithm 2 Sampling from NDM |
| Open Source Code | No | The paper does not provide a specific repository link, explicit code release statement, or indicate that code is in supplementary materials. |
| Open Datasets | Yes | We demonstrate NDMs with learnable transformations on the MNIST (Deng, 2012), CIFAR-10 (Krizhevsky et al., 2009), downsampled Image Net (Deng et al., 2009; Van Den Oord et al., 2016) and Celeb A-HQ-256 (Karras et al., 2017) datasets. |
| Dataset Splits | No | The paper mentions using test data but does not explicitly provide information on validation dataset splits, only stating that NLL and NELBO are calculated on test data. |
| Hardware Specification | Yes | The training was performed using Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using the RK45 solver and U-Net architecture but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The hyper-parameters are presented in Table 6. In all experiments we use same neural network architectures to parameterize both the generative process and the transformations Fφ. To facilitate the training process, we employed a polynomial decay learning rate schedule, which includes a warm-up phase for a specified number of training steps. During the warm-up phase, the learning rate is linearly increased from 10 8 to the peak learning rate. Once the peak learning rate is reached, the learning rate is linearly decayed to 10 8 until the final training step. |