Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Elucidating the Exposure Bias in Diffusion Models
Authors: Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, Itir Onal Ertugrul
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, Di T, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. |
| Researcher Affiliation | Collaboration | Mang Ning Utrecht University EMAIL Mingxiao Li KU Leuven EMAIL Jianlin Su Moonshot AI Ltd. EMAIL Albert Ali Salah Utrecht University EMAIL Itir Onal Ertugrul Utrecht University EMAIL |
| Pseudocode | Yes | Algorithm 1 Variance error under single-step sampling... Algorithm 2 Variance error under multi-step sampling... Algorithm 3 Measurement of Exposure Bias δt |
| Open Source Code | Yes | The code is at https://github.com/forever208/ADM-ES |
| Open Datasets | Yes | Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, Di T, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation... CIFAR-10 (Krizhevsky et al., 2009), LSUN tower (Yu et al., 2015) and FFHQ (Karras et al., 2019)... Celeb A 64 64 datasets (Liu et al., 2015)... Image Net 256 256 |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) used for training the models or baselines evaluated. |
| Hardware Specification | No | The paper does not mention any specific hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We present the complete parameters k, b used in all experiments and the details on the search of k, b in Appendix A.10. Overall, searching for the optimal uniform λ(t) is effortless and takes 6 to 10 trials. In Appendix A.11, we also demonstrate that the FID gain can be achieved within a wide range of λ(t), which indicates the insensitivity of λ(t)... Search for the optimal uniform schedule λ(t) = b in a coarse-to-fine manner: use stride 0.001, 0.0005, 0.0001 progressively. |