Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

Authors: Jindong Yang, Han Fang, Weiming Zhang, Nenghai Yu, Kejiang Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments 4.1 Experimental Setup Implementation Details. Our image generation backbone is Stable Diffusion v2.1 (SD v2.1) [3], configured with a guidance scale of 7.5, 50 DDIM denoising steps, and a fixed 512 512 output resolution. T2SMark employs a 16-bit session key and a 256-bit watermark. We set the truncation threshold to τ = 0.674. The session key is embedded in the first channel of the initial noise. This is determined empirically in our parameter selection study (see Appendix B.2). All the experiments are implemented in Py Torch 2.4.1 and run on a single NVIDIA RTX A6000 GPU. Baselines. We compare against three categories of existing methods. Traditional post-processing transforms include dwt Dct [18], dwt Dct Svd [18], and the learning-based Riva GAN [19]; fine-tuning approaches are represented by Stable Signature [22]; Inversion-based Schemes include Tree Ring (TRW) [10], Gaussian Shading (GS) [5] and PRC-Watermark (PRCW) [6]. For all inversion-based methods, we perform 10-step DDIM inversion [2]. During inversion, we employ an empty prompt and fix the guidance scale at 1 to simulate unknown prompt conditions. To ensure fair capacity, dwt Dct, dwt Dct Svd, Gaussian Shading, and PRC-Watermark all embed 256 bits. Riva GAN and Stable Signature use 32 bits and 48 bits respectively, following their official implementations. Evaluation. We evaluate on MS-COCO-2017 [28] dataset (COCO) and Stable-Diffusion-Prompt3 dataset (SDP). For robustness, we compare the TPR at a fixed FPR = 10 6 in the detection setting and per-bit accuracy in the traceability setting. For each method, we sample 500 prompts from the SDP training split, generate 500 watermarked images, apply nine different distortions (see Figure 4), and then perform detection and traceability.
Researcher Affiliation	Academia	Jindong Yang1,2, Han Fang3 , Weiming Zhang1,2, Nenghai Yu1,2, Kejiang Chen1,2 1University of Science and Technology of China 2Anhui Province Key Laboratory of Digital Security 3National University of Singapore EMAIL, EMAIL EMAIL
Pseudocode	No	The paper describes the methodology in prose within Section 3 "Method" and its subsections (3.3 Watermark Encoding, 3.4 Watermark Decoding), but does not include a distinct pseudocode or algorithm block.
Open Source Code	Yes	Our code is provided in the supplementary material, with detailed usage instructions available in the accompanying README.
Open Datasets	Yes	We evaluate on MS-COCO-2017 [28] dataset (COCO) and Stable-Diffusion-Prompt3 dataset (SDP).
Dataset Splits	Yes	For each method, we sample 500 prompts from the SDP training split, generate 500 watermarked images, apply nine different distortions (see Figure 4), and then perform detection and traceability. ... The training set for each method consisted of 8,000 watermarked and 8,000 clean samples, while the test set contained 500 samples each.
Hardware Specification	Yes	All the experiments are implemented in Py Torch 2.4.1 and run on a single NVIDIA RTX A6000 GPU. Experiments were run on a system equipped with an AMD EPYC 7662 Processor and an NVIDIA RTX A6000 GPU.
Software Dependencies	Yes	All the experiments are implemented in Py Torch 2.4.1
Experiment Setup	Yes	Our image generation backbone is Stable Diffusion v2.1 (SD v2.1) [3], configured with a guidance scale of 7.5, 50 DDIM denoising steps, and a fixed 512 512 output resolution. T2SMark employs a 16-bit session key and a 256-bit watermark. We set the truncation threshold to τ = 0.674. Training was conducted for 10 epochs with a batch size of 128 and a learning rate of 1 10 4.