Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Authors: Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy, Nataniel Ruiz, Zeynep Akata
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation is designed to assess the efficacy of our objective for the popular setting of text-to-image (T2I) models. We benchmark the noise hypernetwork against established methods... We present our main quantitative results on the Gen Eval benchmark in Table 1. |
| Researcher Affiliation | Collaboration | 1Technical University of Munich 2Munich Center of Machine Learning 3Helmholtz Munich 4University of Tübingen 5Inceptive 6Google |
| Pseudocode | Yes | Algorithm 1 Hyper Noise 1: Input: gθ (distilled generative Model), r (reward fn), Optional C = {ci}N i=1 (condition dataset) 2: Initialize Noise Hypernetwork fϕ( ) = 0 through Lo RA weights ϕ applied on top of gθ 3: while training do 4: Sample noise x0 N(0, I), c = 5: if C then 6: Sample condition c C 7: Predict modulated noise x0 = fϕ(x0, c) 8: Generate x1 = gθ(x0 + x0, c) 9: Compute Loss Lnoise(ϕ) = 1 2 x0 2 r(x1) 10: Gradient step on ϕLnoise(ϕ) 11: return Noise Hypernetwork Lo RA weights ϕ |
| Open Source Code | Yes | Code is available at https://github.com/Explainable ML/Hyper Noise. |
| Open Datasets | Yes | Training for the noise hypernetwork is performed using ~70k prompts from Pick-a-Picv2 [48], T2ICompbench train set [37], and Attribute Binding (ABC-6K) [25] prompts. Our evaluations of the trained models are performed on Gen Eval [26] |
| Dataset Splits | Yes | Training for the noise hypernetwork is performed using ~70k prompts from Pick-a-Picv2 [48], T2ICompbench train set [37], and Attribute Binding (ABC-6K) [25] prompts. Our evaluations of the trained models are performed on Gen Eval [26], ensuring that the training and evaluation prompts do not have any overlap, measuring the generalization of the noise hypernetwork to unseen prompts. |
| Hardware Specification | Yes | This experiment was conducted on 1 H100 GPU. (Section B.1) All training runs were conducted on 6 H100 GPUs. (Section B.2) |
| Software Dependencies | No | Additionally, we employ Pytorch Memsave [7] to all models, which further reduces the needed GPU memory during training enabling us to use larger batch sizes. We run all experiments in bfloat16. |
| Experiment Setup | Yes | We provide the full hyperparameters in Table 3. This experiment was conducted on 1 H100 GPU. (Section B.1) Table 4: Hyperparameters for the Human-preference Reward setting (Section B.2) |