Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Shallow Diffuse: Robust and Invisible Watermarking through Low-Dim Subspaces in Diffusion Models

Authors: Wenda Li, Huijie Zhang, Qing Qu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present a comprehensive set of experiments to demonstrate the robustness and consistency of Shallow-Diffuse across various datasets. We begin by highlighting its performance in terms of robustness and consistency in both the server scenario (Section 5.1) and the user scenario (Section 5.2). We further explore the trade-off between robustness and consistency in Section 5.3. Table 1: Generation quality, consistency and watermark robustness under the server scenario. Bold indicates the best overall performance; Underline denotes the best among diffusion-based methods. Evaluation datasets. Evaluation metrics. Attacks.
Researcher Affiliation Academia Wenda Li Huijie Zhang Qing Qu Department of Electrical Engineering & Computer Science University of Michigan, Ann Arbor
Pseudocode Yes Algorithm 1 Unconditional Shallow Diffuse 1: Inject watermark: 2: Input: original image x0 for the user scenario (initial random seed x T for the server scenario), watermark λ x, embedding timestep t , 3: Output: watermarked image x W 0 , 4: if user scenario then 5: xt = DDIM Inv (x0, t ) 6: else server scenario 7: xt = DDIM (x T , t ) 8: end if 9: x W t xt + λ x, x W 0 DDIM x W t , 0 10: Embed watermark 11: Return: x W 0 12: 13: Detect watermark: 14: Input: Attacked image x W 0 , watermark λ x, embedding timestep t , 15: Output: Distance score η, 16: x W t DDIM Inv x W 0 , t 17: η = Detector x W t , λ x 18: Return: η
Open Source Code No Answer: [No] Justification: We will make it open source in the future.
Open Datasets Yes For the user scenario (Section 5.2), we utilize the MS-COCO [52], and Diffusion DB datasets [53]. The first one is a real-world dataset, while Diffusion DB is a collection of diffusion model-generated images. From each dataset, we select 500 images for evaluation.
Dataset Splits Yes For the server scenario (Section 5.1), all diffusion-based methods are based on the same Stable Diffusion, with the original images x0 generated from identical initial seeds x T . Non-diffusion methods are applied to these same original images x0 in a post-watermarking process. A total of 5000 original images are generated for evaluation in this scenario. For the user scenario (Section 5.2), we utilize the MS-COCO [52], and Diffusion DB datasets [53]. The first one is a real-world dataset, while Diffusion DB is a collection of diffusion model-generated images. From each dataset, we select 500 images for evaluation. For the remaining experiments in Section 5.3 and Appendix C, we use the server scenario and sample 100 images for evaluation.
Hardware Specification Yes Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: All experiments are run over one A40 GPU.
Software Dependencies No We use Stable Diffusion 2-1-base [3] as the underlying model for our experiments, applying Shallow diffusion within its latent space. To assess whether Shallow Diffuse generalizes beyond U-Net based diffusion architectures, we conducted an additional study on FLUX [73], a transformer-based diffusion model that employs a Flow Matching noise scheduler.
Experiment Setup Yes In practice, we choose t = 0.3T based on results from the ablation study in Section 5.4. The mask M is circular, with the white area representing 1 and the black area representing 0 in Figure 3. The mask is used to modify specific frequency bands of the image. Specifically, circular mask M has a radius of 8. We conducted ablation studies on the number of sampling steps, across 10, 25, and 50 steps. The results, shown in Table 10, indicate that Shallow Diffuse is not highly sensitive to sampling steps.