Star-Shaped Denoising Diffusion Probabilistic Models
Authors: Andrey Okhotin, Dmitry Molchanov, Arkhipkin Vladimir, Grigory Bartosh, Viktor Ohanesian, Aibek Alanov, Dmitry P. Vetrov
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the model in different settings and find it competitive even on image data, where Beta SS-DDPM achieves results comparable to a Gaussian DDPM. Our implementation is available at https://github.com/andrey-okhotin/star-shaped. |
| Researcher Affiliation | Collaboration | Andrey Okhotin HSE University, MSU University Moscow, Russia andrey.okhotin@gmail.com Dmitry Molchanov BAYESG Budva, Montenegro dmolch111@gmail.com Vladimir Arkhipkin Sber AI Moscow, Russia arkhipkin.v98@gmail.com Grigory Bartosh AMLab, Informatics Institute University of Amsterdam Amsterdam, Netherlands g.bartosh@uva.nl Viktor Ohanesian Independent Researcher v.v.oganesyan@gmail.com Aibek Alanov AIRI, HSE University Moscow, Russia alanov.aibek@gmail.com Dmitry Vetrov Constructor University Bremen, Germany dvetrov@constructor.university |
| Pseudocode | Yes | Algorithm 1 SS-DDPM training ... Algorithm 2 SS-DDPM sampling |
| Open Source Code | Yes | Our implementation is available at https://github.com/andrey-okhotin/star-shaped. |
| Open Datasets | Yes | We apply SS-DDPM to a geodesic dataset of fires on the Earth s surface (EOSDIS, 2020) ... apply Categorical SS-DDPM to unconditional text generation on the text8 dataset (Mahoney, 2011). ... Finally, we evaluate SS-DDPM on CIFAR-10. |
| Dataset Splits | Yes | We use a standard 90, 000, 000/5, 000, 000/500, 000 train-test-validation split and train neural network for 512 epochs |
| Hardware Specification | Yes | time costs when using 3 NVIDIA A100 GPUs: training took approx. 112 hours and estimating NLL on the test set took approx. 2.5 hours). ... time costs when using 4 NVIDIA 1080 GPUs: training took approx. 96 hours, sampling of 50, 000 images took approx. 10 hours). |
| Software Dependencies | No | The paper mentions specific software tools like "Adam", "AdamW", "NCSN++", but does not provide version numbers for these or other software dependencies required for reproducibility. |
| Experiment Setup | Yes | All models on synthetic data were trained for 350k iterations with batch size 128. ... We optimize Dirichlet SS-DDPM on the VLB objective without any modifications using Adam with a learning rate of 0.0004. The DDPM was trained on Lvlb using Adam with a learning rate of 0.0002. ... We optimize Lvlb using the Adam W optimizer with a learning rate of 0.0002 and exponential decay with γ = 0.999997. The model is trained for 2, 000, 000 iterations with batch size 100. For inference, we also use EMA weights with a decay of 0.9999. |