Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Algorithm- and Data-Dependent Generalization Bounds for Diffusion Models

Authors: Benjamin Dupuis, Dario Shariatian, Maxime Haddouche, Alain Durmus, Umut Simsekli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical findings are supported by empirical results on several datasets. Experimental validation. We design low and high dimensional experiments to validate our theory on different algorithms, varying optimizers (SGLD, ADAM), learning rates and batch sizes.
Researcher Affiliation	Academia	1 INRIA, France 2 CNRS, Ecole Normale Supérieure, PSL Research University, France 3 Ecole Polytechnique, CMAP, IP Paris, France
Pseudocode	No	The paper describes algorithms like the SGLD recursion (Section 4.2) and the Euler EI scheme (Equation 6) using mathematical notation and descriptive text, but it does not present them within clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our make our implementation publicly available in https://github.com/dario Shar/ Generalization-Diffusion-Models.
Open Datasets	Yes	we present simple experiments illustrating the concrete impact of optimization hyperparameters on the generalization ability of the generated distribution... a Gaussian mixture model, MNIST [LBBH98], and the butterflies dataset [WME09]... we then shift focus to higher-dimensional datasets, the flowers dataset [NZ06], and the butterflies dataset [WME09].
Dataset Splits	Yes	For MNIST, we compute the FID using 2,048 generated images compared against 2,048 real images, once using the training set (train FID) and once using the test set (test FID)... The butterflies dataset consists of 702 training images and 130 test images; the flowers dataset consists of 1,020 training images (combining train and validation sets), and 340 test images.
Hardware Specification	Yes	Experiments are conducted using 8 NVIDIA A100 GPUs.
Software Dependencies	No	All experiments are implemented using Py Torch. Our implementation relies on the U-Net architecture from [DN21], available at https://github.com/openai/improved-diffusion, and with configurations described in Table 1. The optimization is carried out using Stochastic Gradient Langevin Dynamics (SGLD) with no momentum or weight decay, using the torch-sgld package. To evaluate generative quality, we compute the Wasserstein-2 (W2) distance between the generated and target data distributions... using the pyemd package [Las17], with default settings. While several software packages and libraries (PyTorch, U-Net, torch-sgld, pyemd) are mentioned, specific version numbers for these dependencies are not provided in the paper.
Experiment Setup	Yes	We vary learning rates and batch sizes to obtain multiple measure points and validate our bounds... The optimization is carried out using Stochastic Gradient Langevin Dynamics (SGLD) with no momentum or weight decay... We vary the inverse temperature parameter β {104, 106, 1010}... We also sweep over the step size, batch size, and dataset size, with batch size equal to the number of samples. Specifically: Step size {2e 4, 5e 4, 1e 3, 2e 3, 5e 3}, Total number of samples(= batch size) {512, 1024, 2048, 4096, 8192}... We use the Adam optimizer [KB17] for training... We vary the learning rate and batch size across: Learning rates: {5e 6, 1e 5, 2e 5, 1e 4, 2e 4}, Batch sizes: {4, 16, 64, 128}.