Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

Authors: Yepeng Liu, Yiren Song, Hai Ci, Yu Zhang, Haofan Wang, Mike Zheng Shou, Yuheng Bu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to implement our watermark removal methods across various watermark methods, including low and high perturbation watermarks. The results demonstrate that our Ctrl Regen effectively reduces the detection performance (TPR@1%FPR) of Stega Stamp from 1.00 to 0.01 and of Tree Ring from 0.99 to 0.12. Conversely, the uncontrolled regeneration method proves less effective for these two watermark methods. Moreover, our Ctrl Regen+ achieves better image quality/consistency while maintaining the same watermark removal performance compared to the uncontrolled regeneration approach.
Researcher Affiliation Collaboration Yepeng Liu1, Yiren Song2, Hai Ci2, Yu Zhang3, Haofan Wang4, Mike Zheng Shou2, Yuheng Bu1 1University of Florida 2Show Lab, National University of Singapore 3Tongji University 4Instant X Team EMAIL, EMAIL, EMAIL
Pseudocode Yes The entire inference process of Ctrl Regen is outlined in Algorithm 1 of Appendix B. The inference process of Ctrl Regen+ is detailed in Algorithm 2 of Appendix B.
Open Source Code Yes Our code is available at https://github.com/yepengliu/Ctrl Regen.
Open Datasets Yes For the post-hoc watermarking methods, we sample 1000 real photos from the MIRFLICKR (Huiskes & Lew, 2008)... For the in-generation watermarking methods, we sample 1000 prompts from a large-scale text-to-image prompt dataset, Diffusion DB (Wang et al., 2022)... Additionally, we train the semantic control adapter using 10 million images sampled from LAION-2B (Schuhmann et al., 2022) and COYO-700M 1. The spatial control network is trained using 118k image-canny pairs from MSCOCO (Lin et al., 2014).
Dataset Splits Yes For the post-hoc watermarking methods, we sample 1000 real photos from the MIRFLICKR (Huiskes & Lew, 2008)... For the in-generation watermarking methods, we sample 1000 prompts from a large-scale text-to-image prompt dataset, Diffusion DB (Wang et al., 2022)... Additionally, we train the semantic control adapter using 10 million images sampled from LAION-2B (Schuhmann et al., 2022) and COYO-700M 1. The spatial control network is trained using 118k image-canny pairs from MSCOCO (Lin et al., 2014).
Hardware Specification Yes The training of the semantic control adapter is conducted on 8 NVIDIA A100 GPUs, and the batch size is set to 8 per GPU. The training of the spatial control network is carried out on 8 NVIDIA A100 GPUs with a batch size of 4 per GPU. At the inference stage, we conduct experiments on a single NVIDIA RTX 4090.
Software Dependencies Yes We employ Stable Diffusion-v1.5 (Rombach et al., 2022) as the backbone for our model, maintaining its parameters in a frozen state to preserve the original capabilities. For the semantic control adapter, we integrate DINOv2-giant (Oquab et al., 2023) as the image encoder, also keeping its parameters frozen to leverage its pre-trained strengths.
Experiment Setup Yes We employ Stable Diffusion-v1.5 (Rombach et al., 2022) as the backbone for our model, maintaining its parameters in a frozen state to preserve the original capabilities. For the semantic control adapter, we integrate DINOv2-giant (Oquab et al., 2023) as the image encoder, also keeping its parameters frozen to leverage its pre-trained strengths. The training of the semantic control adapter is conducted on 8 NVIDIA A100 GPUs, and the batch size is set to 8 per GPU. The training of the spatial control network is carried out on 8 NVIDIA A100 GPUs with a batch size of 4 per GPU. ... For the experimental results shown in Table 1, we set the noising and denoising steps to 70 for Regen, while Rinse applies the Regen process twice. ... For high-perturbation watermarks, such as Stega Stamp and Tree Ring, we set the noising steps to be {100, 200, 300, 400, 500, 1000} and sample from pure Gaussian noise. ... For low-perturbation watermarks, a small number of noising steps is sufficient to remove the watermark. Therefore, we evaluate the performance of Ctrl Regen+ using a range of relatively low noising steps, i.e., {20, 40, 60, 80, 100, 120, 140}.