On the Generalization Properties of Diffusion Models
Authors: Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our findings contribute to the rigorous understanding of diffusion models generalization properties and provide insights that may guide practical applications. ... Furthermore, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models generalization properties and provide insights that may guide practical applications. |
| Researcher Affiliation | Collaboration | Puheng Li* Department of Statistics Stanford University puhengli@stanford.edu Zhong Li* Machine Learning Group Microsoft Research Asia lzhong@microsoft.com Huishuai Zhang Machine Learning Group Microsoft Research Asia huzhang@microsoft.com Jiang Bian Machine Learning Group Microsoft Research Asia jiabia@microsoft.com |
| Pseudocode | No | The paper provides mathematical derivations and descriptions of the model but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/lphLeo/Diffusion_Generalization |
| Open Datasets | Yes | In this subsection, we verify our results on the MNIST dataset using the standard U-net architecture as the score network, which suggests that the adverse effect of modes shift on the generalization performance of diffusion models also appears in general. |
| Dataset Splits | No | The paper mentions training on datasets but does not explicitly provide details about training/validation/test splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for the experiments. |
| Software Dependencies | No | The paper mentions using 'SGD optimizer' and 'Adam optimizer', 'U-net architecture', and 'one-hidden-layer neural network with Swish activations', but does not specify version numbers for any software or libraries. |
| Experiment Setup | Yes | We select the one-hidden-layer neural network with Swish activations as the score network, which is trained using the SGD optimizer with a fixed learning rate 0.5. The target distribution is set to be a onedimensional 2-mode Gaussian mixture with the modes distance equalling 6, and the number of data samples is 1000. ... All the configurations remain the same as Section 4.1 except that the learning rate is now 10 3. |