Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

EVODiff: Entropy-aware Variance Optimized Diffusion Inference

Authors: Shigui Li, Wei Chen, Delu Zeng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45.5% (FID improves from 5.10 to 2.78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25% (from 20 to 15 NFE) for high-quality samples on Image Net-256, and improves text-to-image generation while reducing artifacts.
Researcher Affiliation	Academia	Shigui Li School of Mathematics South China University of Technology Guangzhou, China EMAIL Wei Chen School of Mathematics South China University of Technology Guangzhou, China EMAIL Delu Zeng School of Electronic and Information Engineering South China University of Technology, Guangzhou, China; Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Canada EMAIL
Pseudocode	Yes	Algorithm 1 EVODiff: Optimizing Denoising Variance of Diffusion Model Inference. Require: initial x T , time schedule {ti}N i=0, model xθ. 1: xt N x T , hti := κ(ti 1) κ(ti), rlog SNR(i) := log κ(ti) log κ(ti+1) log κ(ti 1) log κ(ti). 2: Denote g(xti) := σti 1 σti xti + σti 1htixθ (xti, ti). # Euler s or DDIM s iteration. 3: for i N to 1 do 4: xti g(xti+1). 5: xti 1 g(xti) + σti 1 h2 ti 2 Bθ(ti, li). 6: Bθ(ti) 1 ηi 2 Bθ(si, ti) + ηi 2 Bθ(ti, li), where ηi is refined by Eq. (25). 7: xti 1 g(xti) + σti 1 h2 ti 2ζi Bθ(ti), where ζi is refined by Eq. (25). 8: end for Ensure: x0.
Open Source Code	Yes	Code is available at https://github.com/Shigui Li/EVODiff.
Open Datasets	Yes	We experimentally validate our method on a diverse suite of DMs and datasets, including CIFAR-10, Celeb A-64, FFHQ-64, Image Net-64, Image Net-256, and LSUN-Bedrooms. Our evaluation uses standard metrics such as Fréchet Inception Distance (FID) and Inception Score (IS) across a varying number of function evaluations (NFEs).
Dataset Splits	Yes	Table 2: Quantitative results of FID and IS scores for gradient-based methods on Image Net-256, FFHQ-64, and CIFAR-10. The results are evaluated on 10k and 50k samples for various NFEs. The DPM-Solver++ is our baseline.
Hardware Specification	Yes	All experiments were conducted on NVIDIA GPUs. For high-dimensional datasets like Image Net, we utilized the NVIDIA Ge Force RTX 3090 GPU with 24GB VRAM. For other cases like CIFAR-10, experiments were performed on NVIDIA TITAN X (Pascal) with 12GB VRAM.
Software Dependencies	No	The paper lists various codebases used for baselines (Table 7: Score SDE, EDM, Guided-Diffusion, Latent-Diffusion, Stable-Diffusion, DPM-Solver, DPM-Solver++, Sci RE-Solver, Uni PC, DPM-Solver-v3). However, it does not explicitly state the version numbers for the ancillary software (e.g., Python, PyTorch, CUDA) that they used for their implementation. It only provides URLs to repositories, which might contain version information, but the paper itself doesn't explicitly state them in the text.
Experiment Setup	Yes	Table 9: We conducted ablation experiments with different shift parameters in EVODiff 1, using the pre-trained model [4] on Image Net-256 256 [76]. We report the FID evaluated on 10k samples for various NFEs and and guidance scales. Table 11: We conducted ablation experiments under different guidance scales and different random seeds. Quantitative results of the gradient estimation-based denoising iterations using the pre-trained model [4] on Image Net-256 256 [76]. We report the FID for 10k samples evaluated under various NFEs. Figure 5: Random samples from the Stable-Diffusion-v1.5 model [28] with a guidance scale of 7.5, using varying NFEs and the prompt Giant caterpillar riding a bicycle.