Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ambient Diffusion Omni: Training Good Models with Bad Data

Authors: Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans, Antonio Torralba, Constantinos Daskalakis

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments Controlled experiments to show utility from low-quality data. To verify our method, we first do synthetic experiments on artificially corrupted data. We use EDM [35] as our baseline, and we train networks on CIFAR-10 and FFHQ. ... For evaluation, we measure FID [29] with respect to the full uncorrupted dataset (which is not available during training).
Researcher Affiliation	Academia	Giannis Daras Massachusetts Institute of Technology EMAIL Rodriguez-Munoz Massachusetts Institute of Technology EMAIL Klivans The University of Texas at Austin EMAIL Torralba Massachusetts Institute of Technology EMAIL Daskalakis Massachusetts Institute of Technology EMAIL
Pseudocode	No	The paper describes the algorithms in text, for example in Section 4, "Algorithm 1. Algorithm 1 trains a diffusion model using access to n1 samples from a target density p0..." and "Algorithm 2. Algorithm 2 trains a diffusion model using access to n1 + n2 samples from a density p0...". There are no explicit pseudocode blocks or figures labeled as such.
Open Source Code	Yes	We release our code and models: https://github.com/giannisdaras/ambient-omni.
Open Datasets	Yes	We use EDM [35] as our baseline, and we train networks on CIFAR-10 and FFHQ. ... CIFAR-10 [43] consists of 60,000 32x32 images of ten classes ... FFHQ [37] consists of 70,000 512x512 images of faces from Flickr. ... AFHQ [12] consists of 5,653 images of cats, 5,239 images of dogs and 5,000 images of wildlife ... Image Net [20] consists of 1,281,167 images of variable resolution from 1000 classes. ... Conceptual Captions [56] consists of 12M (image url, caption) pairs. ... Segment Anything [42] consists of 11.1M high-resolution images annotated with segmentation masks. ... Journey DB consists of 4.4M synthetic image-caption pairs from Midjourney [63]. ... Diffusion DB consists of 14M synthetic image-caption pairs, mostly generated from Stable Diffusion models [70].
Dataset Splits	Yes	In a controlled experiment with restricted access only to 10% of the clean dataset, our method of Ambient-o uses corrupted and out-of-distribution data to improve performance. ... For the blurring experiments, we use a Gaussian kernel with standard deviation σB = 0.4, 0.6, 0.8, 1.0, and we corrupt 90% of the data. ... Table 8: Effect of clean data proportion on FID. Increasing the fraction of clean data substantially improves FID scores. Clean Data (%) FID 1% 21.9 5% 12.9 10% 6.2 30% 2.8 50% 2.4
Hardware Specification	Yes	On a single 8x V100 node we achieved a throughput of 0.8s per 1k images, for an average of 4.4h per training run. ... On 32 H200 GPUs, XS models took ~3 days to train, while XXL models took ~7 days.
Software Dependencies	No	The paper mentions software like "EDM [35] codebase", "EDM2 [36] codebase", "Micro Diffusion codebase [54]", "Adam optimizer [41]", and "Adam W optimizer [41]". However, it does not specify explicit version numbers for these software components or any other libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We use the Adam optimizer [41] with learning rate 0.001, batch size 512, and no weight decay. ... Same as for CIFAR-10, except learning was set to 2e 4, we trained for a maximum of 100 106 images worth of training, and saw best results around 30 106 images worth. ... We use the Adam optimizer [41] with reference learning rate 0.012, batch size 2048, and no weight decay. ... We use the Adam W optimizer [41] with reference learning rates 2.4e 4/8e 5/8e 5/8e 5 for each of the four phases and batch size 2048 for all phases.