Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Seeds of Structure: Patch PCA Reveals Universal Compositional Cues in Diffusion Models

Authors: Qingsong Wang, Zhengchao Wan, Misha Belkin, Yusu Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate this relationship using patch-wise Principal Component Analysis (PCA) and empirically demonstrate that low-frequency components of the initial noise predominantly influence the compositional structure of generated images. Our analyses reveal that noise seeds inherently contain universal compositional cues, evident when identical seeds produce images with similar structural attributes across different datasets and model architectures. Leveraging these insights, we develop and theoretically justify a simple yet effective Patch PCA denoiser that extracts underlying structure from noise using only generic natural image statistics. The robustness of these structural cues is observed to persist across both pixel-space models and latent diffusion models, highlighting their fundamental nature. Finally, we introduce a zero-shot editing method that enables injecting compositional control over generated images, providing an intuitive approach to guided generation without requiring model fine-tuning or additional training.
Researcher Affiliation Academia Qingsong Wang Halıcıo glu Data Science Institute University of California, San Diego La Jolla, CA 92093 EMAIL Zhengchao Wan Department of Mathematics University of Missouri Columbia, MO 65211 EMAIL Mikhail Belkin Halıcıo glu Data Science Institute University of California, San Diego La Jolla, CA 92093 EMAIL Yusu Wang Halıcıo glu Data Science Institute University of California, San Diego La Jolla, CA 92093 EMAIL
Pseudocode Yes Algorithm 1 Patch-based PCA Noise Editing with Frequency Band Control 1: Input: Original noise z ∈ R64×64×3, orthonormal Patch-PCA basis {ui}K i=1, cutoff index n with 1 ≤ n ≤ K, where K = 3p2 2: Output: Modified noise z̃ with resampled high-frequency components 3: Partition z into disjoint patches {zi}N i=1 of size p × p (we use non-overlapping 4 × 4 patches in experiments). 4: for each patch zi do 5: Decompose the patch into the Patch-PCA basis: ∑p2 j=1 ai,j uj, ai,j = 〈zi, uj〉
Open Source Code No The pre-trained models used in our experiments are the diffusion models trained on Image Net, FFHQ, and AFHQ datasets are available from the official repository of Karras et al. [17]: https://github.com/NVlabs/edm and the transformer-based U-Vi T model is available from the official repository of Bao et al. [4]: https://github.com/baofff/U-Vi T. We also train flow matching on the AFHQ dataset using the repository of Tong et al. [31]: https://github.com/atong01/ conditional-flow-matching.
Open Datasets Yes The datasets used in our experiments are Image Net [8], FFHQ [16], AFHQ [6], and CIFAR-10 [18].
Dataset Splits Yes We first compute a generic patch PCA covariance matrix using 4 × 4 patches extracted from 10,000 randomly sampled images from the Image Net dataset and compute the covariance matrix with its eigen-decomposition. We fix 128 random noise initializations and generate reference images using the pre-trained models. We establish random noise initializations (seeds 0-3), with unedited generation results shown in Figure 7a.
Hardware Specification No We also acknowledge the Delta HPC cluster at the NCSA (National Center for Supercomputing Applications), University of Illinois, accessed via the NSF ACCESS program (allocation no. TG-CIS220009).
Software Dependencies Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the algorithm in the paper which can be used to reproduce the results. Guidelines: ... Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: We provide the experimental setup and details in the main paper and appendix. Guidelines: ... Question: Does the paper provide SPECIFIC ANCILLARY SOFTWARE DETAILS (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment? Answer: [Yes] Justification: We provide error bars in our experiments. Guidelines: ...
Experiment Setup Yes Methodology: We first compute a generic patch PCA covariance matrix using 4 × 4 patches extracted from 10,000 randomly sampled images from the Image Net dataset and compute the covariance matrix with its eigen-decomposition. We conduct experiments using pre-trained EDM models [17] on diverse datasets: Image Net [8] (64 × 64), FFHQ, AFHQ, and CIFAR-10 [18]. We fix 128 random noise initializations and generate reference images using the pre-trained models. For each noise initialization, and for each frequency band [n, 48] (where n ranges from 0 to 48), we generate 100 perturbed variants by subdividing the noise tensor into non-overlapping 4 × 4 patches and resampling the identified frequency bands [n, 48] on each patch; see Algorithm 2 for the details. In our experiments, we found the approach robust to different patch sizes ranging from 5 to 31 We use the Patch PCA denoiser in the DDIM sampling [28] (see Equation (1)) to generate images.