Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

Authors: Jisung Hwang, Jaihoon Kim, Minhyuk Sung

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence. ... We conduct a toy experiment using the image generative model FLUX [20] to evaluate how different regularization terms guide a latent vector toward a standard Gaussian distribution when optimized from a highly structured initialization. ... We present quantitative and qualitative results in Figure 4 and Figure 5, respectively.
Researcher Affiliation	Academia	Jisung Hwang Jaihoon Kim Minhyuk Sung KAIST EMAIL
Pseudocode	No	The paper describes methods and proofs using mathematical equations and text, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	We will publicly release the code upon acceptance.
Open Datasets	Yes	We evaluate on the animal prompts from DDPO [5]. We report the given reward along with held-out rewards Image Reward [36] and HPSv2 [35]... For text-aligned image generation, we use Pick Score [19] as the given reward... We evaluate on 60 prompts sampled from T2I-Comp Bench++ [15], comprising 10 prompts from each of the six categories: 3D spatial, complex, non-spatial, shape, spatial, and texture.
Dataset Splits	Yes	We evaluate on 60 prompts sampled from T2I-Comp Bench++ [15], comprising 10 prompts from each of the six categories: 3D spatial, complex, non-spatial, shape, spatial, and texture.
Hardware Specification	Yes	All experiments were conducted on an NVIDIA A6000 GPU with 48GB VRAM, taking approximately 2 minutes per 100 optimization iterations.
Software Dependencies	No	The paper mentions using FLUX [20] as the base generative model, but does not specify any version numbers for software libraries, frameworks, or other dependencies.
Experiment Setup	Yes	We initialize the latent vector from the prior distribution (a unit Gaussian) and perform optimization for 500 iterations using Nesterov momentum with a coefficient of 0.9 and gradient clipping set to 0.01. The generated images are evaluated every 100 iterations. The learning rate is set to 0.1 for aesthetic score [28] and 1.0 for Pick Score [19]. We set the regularization coefficient to 2.0 for all regularization methods. The regularization gradient is normalized and scaled to match the magnitude of the reward gradient, ensuring balanced contributions during optimization.