Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Authors: Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we conduct a comprehensive theoretical and empirical investigation of this phenomenon. Leveraging the inherent low-dimensionality structure of image data, we theoretically demonstrate that the unimodal dynamic emerges when the diffusion model successfully captures the underlying data distribution. The unimodality arises from an interplay between denoising strength and class confidence across noise scales. Empirically, we further show that, in classification tasks, the presence of unimodal dynamics reliably reflects the diffusion model s generalization: it emerges when the model generate novel images and gradually transitions to a monotonically decreasing curve as the model begins to memorize the training data. |
| Researcher Affiliation | Academia | Xiao Li University of Michigan EMAIL Zekai Zhang University of Michigan EMAIL Xiang Li University of Michigan EMAIL Siyi Chen University of Michigan EMAIL Zhihui Zhu Ohio State University EMAIL Peng Wang University of Michigan EMAIL Qing Qu University of Michigan EMAIL |
| Pseudocode | No | The paper describes mathematical frameworks and network parameterizations through equations and textual descriptions, but it does not contain any clearly labeled pseudocode blocks or algorithms. |
| Open Source Code | No | Our work uses publically available datasets (such as CIFAR) and codebases (such as EDM). We primarily utilize the codebase of EDM [49], which is released under the Creative Commons Attribution-Non Commercial-Share Alike 4.0 (CC BY-NC-SA 4.0) license. We also use code from the Git Hub repository accompanying [19], which is licensed under the MIT License. |
| Open Datasets | Yes | For the datasets, CIFAR datasets [50], Image Net [51], Oxford 102 Flowers [93], and DTD [96] are publicly available for academic use. The Celeb A dataset [97] is released for non-commercial research purposes only under a custom license. Oxford-IIIT Pet [91] is available under the CC BY-SA 4.0 license. The FFHQ dataset [98] is distributed by NVIDIA under the CC BY-NC-SA 4.0 license. |
| Dataset Splits | Yes | Experimental details for Figure 6. We use the DDPM++ network and VP configuration to train diffusion models[49] on the CIFAR10 dataset, using two network configurations: UNet-32 and UNet64, by varying the embedding dimension of the UNet. Training dataset sizes range exponentially from 2^8 to 2^15. For each dataset size, both UNet-32 and UNet-64 are trained on the same subset of the training data. ... Experimental details for Figure 7 and Figure 8. We use the DDPM++ architecture with the EDM configuration to train a UNet-128 diffusion model [49] on CIFAR10 and CIFAR100, using 4096 image training subsets. |
| Hardware Specification | Yes | Computational resources. Most experiments are conducted on a single NVIDIA A40 GPU, except for training on subsets of images (e.g., Figure 8), which is performed using two A40 GPUs. |
| Software Dependencies | No | The paper mentions utilizing codebases like EDM [49] and a GitHub repository accompanying [19], and using the Adam optimizer [101], but it does not provide specific version numbers for any of these software components or underlying programming languages/libraries. |
| Experiment Setup | Yes | Experimental details for Figure 3 and Figure 15. For the Mo LRG experiments, we train the our parameterized model (4) following the setup provided in an open-source repository [100]. The model is trained on a d = 5, n = 50, K = 3 and δ = 0.2 Mo LRG dataset containing 12000 samples. Training is conducted for 200 epochs using DDPM scheduling with T = 500, employing the Adam optimizer with a learning rate of 5e-4. |