Star-Shaped Denoising Diffusion Probabilistic Models

Authors: Andrey Okhotin, Dmitry Molchanov, Arkhipkin Vladimir, Grigory Bartosh, Viktor Ohanesian, Aibek Alanov, Dmitry P. Vetrov

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the model in different settings and find it competitive even on image data, where Beta SS-DDPM achieves results comparable to a Gaussian DDPM. Our implementation is available at https://github.com/andrey-okhotin/star-shaped.
Researcher Affiliation Collaboration Andrey Okhotin HSE University, MSU University Moscow, Russia andrey.okhotin@gmail.com Dmitry Molchanov BAYESG Budva, Montenegro dmolch111@gmail.com Vladimir Arkhipkin Sber AI Moscow, Russia arkhipkin.v98@gmail.com Grigory Bartosh AMLab, Informatics Institute University of Amsterdam Amsterdam, Netherlands g.bartosh@uva.nl Viktor Ohanesian Independent Researcher v.v.oganesyan@gmail.com Aibek Alanov AIRI, HSE University Moscow, Russia alanov.aibek@gmail.com Dmitry Vetrov Constructor University Bremen, Germany dvetrov@constructor.university
Pseudocode Yes Algorithm 1 SS-DDPM training ... Algorithm 2 SS-DDPM sampling
Open Source Code Yes Our implementation is available at https://github.com/andrey-okhotin/star-shaped.
Open Datasets Yes We apply SS-DDPM to a geodesic dataset of fires on the Earth s surface (EOSDIS, 2020) ... apply Categorical SS-DDPM to unconditional text generation on the text8 dataset (Mahoney, 2011). ... Finally, we evaluate SS-DDPM on CIFAR-10.
Dataset Splits Yes We use a standard 90, 000, 000/5, 000, 000/500, 000 train-test-validation split and train neural network for 512 epochs
Hardware Specification Yes time costs when using 3 NVIDIA A100 GPUs: training took approx. 112 hours and estimating NLL on the test set took approx. 2.5 hours). ... time costs when using 4 NVIDIA 1080 GPUs: training took approx. 96 hours, sampling of 50, 000 images took approx. 10 hours).
Software Dependencies No The paper mentions specific software tools like "Adam", "AdamW", "NCSN++", but does not provide version numbers for these or other software dependencies required for reproducibility.
Experiment Setup Yes All models on synthetic data were trained for 350k iterations with batch size 128. ... We optimize Dirichlet SS-DDPM on the VLB objective without any modifications using Adam with a learning rate of 0.0004. The DDPM was trained on Lvlb using Adam with a learning rate of 0.0002. ... We optimize Lvlb using the Adam W optimizer with a learning rate of 0.0002 and exponential decay with γ = 0.999997. The model is trained for 2, 000, 000 iterations with batch size 100. For inference, we also use EMA weights with a decay of 0.9999.