Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Discretize Denoising Diffusion ODEs

Authors: Vinh Tong, Trung-Dung Hoang, Anji Liu, Guy Van den Broeck, Mathias Niepert

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method with extensive experiments on 7 pre-trained models, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. We achieve FIDs of 2.38 (10 NFE), and 2.27 (10 NFE) on unconditional CIFAR10 and AFHQv2 in 5-10 minutes of training.
Researcher Affiliation	Academia	1University of Stuttgart, 2IMPRS-IS, 3UCLA, 4University of Bern
Pseudocode	Yes	Algorithm 1 LD3
Open Source Code	Yes	Code is available at https://github.com/vinhsuhi/LD3.
Open Datasets	Yes	We evaluate 7 pre-trained diffusion models across different domains. For pixel space models, we include CIFAR10 (32 32) (Krizhevsky & Hinton, 2009), FFHQ (64 64) (Karras et al., 2019), and AFHQv2 (64 64) (Choi et al., 2020). For latent space models, we assess LSUNBedroom (256 256) (Yu et al., 2015) and class-conditional Image Net (256 256) (Russakovsky et al., 2015).
Dataset Splits	Yes	For CIFAR10, FFHQ, and AFHQv2, we use 100 samples for both training and validation and train LD3 for 7 epochs with a batch size of 2. ... For Latent Diffusion (Rombach et al., 2022) on Image Net and LSUN-Bedroom, we use 100 samples for both training and validation, with the training conducted over 5 epochs. ... We evaluate our model using the FID score with 50,000 randomly generated samples.
Hardware Specification	Yes	For instance, at 10 NFE, our model needs approximately 36 minutes on a single NVIDIA A100 GPU, whereas AYS requires 3 to 4 hours on 8 NVIDIA RTX6000s.
Software Dependencies	No	The paper mentions several solvers like DPM_solver++, Uni_PC, and i PNDM, and using codebases from other papers (e.g., Luo & Hu, 2021). However, it does not specify explicit version numbers for general software libraries or programming languages used in the authors' own implementation, such as Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	For CIFAR10, FFHQ, and AFHQv2, we use 100 samples for both training and validation and train LD3 for 7 epochs with a batch size of 2. We set r proportional to the dimensionality d and inversely proportional to the squared NFE: r = γ d NFE2 , where γ = 0.001 in all experiments. ... RMSprop for ξ and SGD for both ξc and x T . The learning rates are denoted as lξ, lξc, and lx T . We set lξ = 0.005 for pixel space datasets and lξ = 0.001 for latent space datasets, while lξc and lx T are NFE-dependent.