Improved Techniques for Training Score-Based Generative Models
Authors: Yang Song, Stefano Ermon
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a new theoretical analysis of learning and sampling from score-based models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can scale scorebased generative models to various image datasets, with diverse resolutions ranging from 64 64 to 256 256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including Celeb A, FFHQ, and several LSUN categories. |
| Researcher Affiliation | Academia | Yang Song Computer Science Department Stanford University yangsong@cs.stanford.edu Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Annealed Langevin dynamics [1] |
| Open Source Code | No | No explicit statement about releasing the source code for the methodology described in this paper or a link to a code repository was found. |
| Open Datasets | Yes | Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including Celeb A, FFHQ, and several LSUN categories. (with citations [2], [16], [27] pointing to CIFAR-10, Celeb A, and LSUN datasets respectively) |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, nor does it describe cross-validation. It mentions training and test data, but no explicit validation split. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, or memory amounts) used for running the experiments were provided. The paper does not specify the computational resources. |
| Software Dependencies | No | No specific ancillary software details with version numbers (e.g., library names with versions) needed to replicate the experiment were provided. The paper does not list software dependencies with their versions. |
| Experiment Setup | Yes | For a complete description on experimental details and more results, please refer to Appendix B and C. (From Appendix B.1: The initial learning rate is 0.0001, which decays by 0.9999 for every 10000 steps. We use Adam [26] optimizer with β1 = 0.9, β2 = 0.999 and ϵ = 10 8. The batch size is 128. We train the NCSN models for 1000000 iterations for Celeb A 64 64 and CIFAR-10 32 32.) |