Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Test-Time Scaling of Diffusion Models via Noise Trajectory Search
Authors: Vignav Ramesh, Morteza Mardani
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to 164% and matching/exceeding MCTS performance. |
| Researcher Affiliation | Collaboration | Vignav Ramesh Harvard University EMAIL Morteza Mardani NVIDIA EMAIL |
| Pseudocode | Yes | Algorithm 1 ϵ-greedy noise search Algorithm 2 EDM Sampling Algorithm Algorithm 3 DDIM Sampling Algorithm Algorithm 4 MCTS noise search Algorithm 5 Rejection Sampling Algorithm 6 Beam Search Algorithm 7 Zero-Order Search |
| Open Source Code | Yes | Code will also be uploaded by supplementary material deadline, providing full reproducibility. Code and data are made publicly available at this Git Hub link. |
| Open Datasets | Yes | We evaluate our proposed test-time scaling approach on the Elucidated Diffusion Model (EDM) [Karras et al., 2022] for class-conditional image generation (on Image Net) and Stable Diffusion [Rombach et al., 2022] for text-to-image generation. |
| Dataset Splits | No | The paper uses pre-trained models (EDM, Stable Diffusion) and evaluates them on generated images. It specifies the number of generated images for evaluation (e.g., "generating 36 images given random Image Net class labels") but does not provide explicit train/test/validation splits for the data used in its own experiments. |
| Hardware Specification | Yes | Generating each sample took <1 second for naive sampling (the lowest-compute method) and <1 minute for MCTS (the highest-compute method) on a single A100 (40GB). |
| Software Dependencies | No | The paper mentions specific sampling algorithms like "EDM sampler (Alg. 2)" and "DDIM sampler (Alg. 3)" and the use of a "neural network DĪø". However, it does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or programming languages. |
| Experiment Setup | Yes | The number of denoising steps is fixed to 18. Unless otherwise specified, we use the classifier-free guidance 1.0, focusing on the simple conditional generation task without guidance [Ho and Salimans, 2022] . We set λ = 0.15 and ϵ = 0.4. Re. notation, N is branching factor, B is beam width, S is number of MCTS simulations, and K is the number of local search iterations. We enable classifier-free guidance (default scale 7.5) and set T = 50 here as is typical with DDIM sampling. |