Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Authors: Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation is designed to assess the efficacy of our objective for the popular setting of text-to-image (T2I) models. We benchmark the noise hypernetwork against established methods... We present our main quantitative results on the Gen Eval benchmark in Table 1.
Researcher Affiliation	Collaboration	1Technical University of Munich 2Munich Center of Machine Learning 3Helmholtz Munich 4University of Tübingen 5Inceptive 6Google
Pseudocode	Yes	Algorithm 1 Hyper Noise 1: Input: gθ (distilled generative Model), r (reward fn), Optional C = {ci}N i=1 (condition dataset) 2: Initialize Noise Hypernetwork fϕ( ) = 0 through Lo RA weights ϕ applied on top of gθ 3: while training do 4: Sample noise x0 N(0, I), c = 5: if C then 6: Sample condition c C 7: Predict modulated noise x0 = fϕ(x0, c) 8: Generate x1 = gθ(x0 + x0, c) 9: Compute Loss Lnoise(ϕ) = 1 2 x0 2 r(x1) 10: Gradient step on ϕLnoise(ϕ) 11: return Noise Hypernetwork Lo RA weights ϕ
Open Source Code	Yes	Code is available at https://github.com/Explainable ML/Hyper Noise.
Open Datasets	Yes	Training for the noise hypernetwork is performed using ~70k prompts from Pick-a-Picv2 [48], T2ICompbench train set [37], and Attribute Binding (ABC-6K) [25] prompts. Our evaluations of the trained models are performed on Gen Eval [26]
Dataset Splits	Yes	Training for the noise hypernetwork is performed using ~70k prompts from Pick-a-Picv2 [48], T2ICompbench train set [37], and Attribute Binding (ABC-6K) [25] prompts. Our evaluations of the trained models are performed on Gen Eval [26], ensuring that the training and evaluation prompts do not have any overlap, measuring the generalization of the noise hypernetwork to unseen prompts.
Hardware Specification	Yes	This experiment was conducted on 1 H100 GPU. (Section B.1) All training runs were conducted on 6 H100 GPUs. (Section B.2)
Software Dependencies	No	Additionally, we employ Pytorch Memsave [7] to all models, which further reduces the needed GPU memory during training enabling us to use larger batch sizes. We run all experiments in bfloat16.
Experiment Setup	Yes	We provide the full hyperparameters in Table 3. This experiment was conducted on 1 H100 GPU. (Section B.1) Table 4: Hyperparameters for the Human-preference Reward setting (Section B.2)