Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models

Authors: Vineet Jain, Kusha Sareen, Mohammad Pedramfar, Siamak Ravanbakhsh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On MNIST and CIFAR-10 class-conditional generation, DTS matches the FID of the best-performing baseline with up to 10 less compute. In text-to-image generation and language completion tasks, DTS effectively searches for high reward samples that match best-of-N with up to 5 less compute. By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples, providing a scalable approach for inference-time alignment of diffusion models.
Researcher Affiliation	Academia	Vineet Jain Kusha Sareen Mohammad Pedramfar Siamak Ravanbakhsh Mila Québec AI Institute, Mc Gill University EMAIL
Pseudocode	Yes	E DTS and DTS algorithm Algorithm 1 Diffusion Tree Sampling (DTS) and Diffusion Tree Search (DTS) Algorithm 2 Diffusion Tree Sampling (DTS) and Diffusion Tree Search (DTS) inference
Open Source Code	Yes	Project page: https://diffusion-tree-sampling.github.io. We have released the code publicly and provided the link in the abstract.
Open Datasets	Yes	On MNIST and CIFAR-10 class-conditional generation, DTS matches the FID of the best-performing baseline with up to 10 less compute. We use Stable Diffusion v1.5 [64], a latent diffusion model, as the prior over 512 512 images x pθ(x \| y) where y denotes the text prompt. We evaluate DTS on text generation using MDLM [68], a discrete diffusion language model. We use a BERT-based classifierb trained on the Corpus of Linguistic Accepatbility (Co LA) [83].
Dataset Splits	Yes	We train the model for 500 epochs on a training set of 10000 samples. For MNIST, we consider two settings: sampling from individual digits, and sampling from even/odd digits. For CIFAR-10, we sample from individual classes. All methods were evaluated with 5000 generated samples per class.
Hardware Specification	Yes	We report execution times on a single A100 GPU with 80 gigabytes of memory. The numbers are reported for a single NVIDIA A100 GPU with 40GB of memory using the Py Torch profiler.
Software Dependencies	No	We used the linear noise schedule with βmin = 0.001 and βmax = 0.07 and the score matching objective. The optimizer used for training was Adam [40] with a learning rate of 3 10 3. We use the DDIMSchedulera from diffusers library with default parameters, except we set η = 1.0 so the inference process is stochastic like DDPMs [30].
Experiment Setup	Yes	The denoising network is an MLP that takes as input the 2-dimensional data xt and the timestep t and outputs a 2-dimensional noise prediction. The timestep is transformed using sinusoidal embeddings [80]. The network has four hidden layers of 128 dimension each with the sigmoid linear unit (Si LU, [28]) activation. We used the linear noise schedule with βmin = 0.001 and βmax = 0.07 and the score matching objective. The optimizer used for training was Adam [40] with a learning rate of 3 10 3. We train the model for 500 epochs on a training set of 10000 samples.