Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models

Authors: Sai Niranjan Ramachandran, Manish Krishan Lal, Suvrit Sra

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We next present case studies that show how our machinery can diagnose and subsequently improve the behaviour of diffusion samplers on practical tasks. Our aim is to illustrate simple recipes for working with user-defined events that need not require a deep dive into the underlying theory but at the same time showcase the utility of our framework. 4.1 Warm-up: Predicting convergence of the data distribution 4.2 Class-conditional generation 4.3 Rare-class generation 4.4 Zero-shot classification 4.5 Zero-shot style transfer Table 1: Acceleration achieved by stopping reverse diffusion at t = i instead of t = n (DDPM schedule). FID scores are averaged over three runs with 95% confidence intervals. Table 2: Class conditional generation Table 4: Zero-shot multi-class accuracy (%); 95% CI over five runs.
Researcher Affiliation	Academia	School of Computation, Information and Technology, Technical University of Munich, Germany Munich Center for Machine Learning (MCML)
Pseudocode	Yes	The proposed Algorithm 1 systematically identifies discrete transitions in cross-fluctuations to characterise the sampling dynamics of desired outcomes. This enables precise intervention to enhance the generation and likelihood of desirable samples, thereby improving overall model performance.
Open Source Code	Yes	We conducted all experiments using a single Nvidia A100 GPU and provided sample code for reproducibility.
Open Datasets	Yes	We evaluate MNIST [Deng, 2012], CIFAR-10 [Krizhevsky et al., 2009], and a compressed INT-8 variant of Image Net [Deng et al., 2009, Ryu, 2024], using the popular DDPM noise schedule [Ho et al., 2020]. CUB-200 (200 bird species), and i Naturalist 2019 (fine-grained flora & fauna).
Dataset Splits	Yes	Pick two Image Net classes {λ, µ} at random and sample min{card(λ), card(µ), 10000} number of images for each class. At each step t, we extract the noisy embedding zt (VP schedule) and train a linear MLP on 80% of the embeddings; the rest forms the test set.
Hardware Specification	Yes	We conducted all experiments using a single Nvidia A100 GPU and provided sample code for reproducibility.
Software Dependencies	No	Our implementation builds on open-source code from Hugging Face (diffusers library) and publicly available code from Kynkäänniemi et al. [2024], Li et al. [2023], Peebles and Xie [2022]. Our method is plug-and-play, requiring simple hyperparameter adjustments for these techniques without any major code modifications.
Experiment Setup	Yes	Following Peebles and Xie [2022] we set w = 1.5 for Di T-XL/2. Stable Diffusion requires stronger guidance; we use a fixed w [3.5, 4.5] per dataset. For every dataset we sweep tend,c {0.1T, 0.2T, . . . , 0.8T}, tstart,c {0.2T, . . . , T}, under the constraint tstart,c > tend,c, yielding 44 admissible pairs. We fix N = 250 for all of our experiments following Li et al. [2023].