Invisible Image Watermarks Are Provably Removable Using Generative AI
Authors: Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, Lei Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through formal proofs and extensive empirical evaluations, we demonstrate that pixel-level invisible watermarks are vulnerable to this regeneration attack. Our results reveal that, across four different pixel-level watermarking schemes, the proposed method consistently achieves superior performance compared to existing attack techniques, with lower detection rates and higher image quality. Code is available at https://github.com/Xuandong Zhao/Watermark Attacker. |
| Researcher Affiliation | Academia | Xuandong Zhao UC Berkeley Kexun Zhang Carnegie Mellon University Zihao Su UC Santa Barbara Saastha Vasan UC Santa Barbara Ilya Grishchenko UC Santa Barbara Christopher Kruegel UC Santa Barbara Giovanni Vigna UC Santa Barbara Yu-Xiang Wang UC San Diego Lei Li Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Regeneration Attack Instance: Removing invisible watermarks with a diffusion model |
| Open Source Code | Yes | Code is available at https://github.com/Xuandong Zhao/Watermark Attacker. |
| Open Datasets | Yes | For real photos, we use 500 randomly selected images from the MS-COCO dataset [33]. For AIgenerated images, we employ the Stable Diffusion-v2.1 model2 from Stable Diffusion [47], a state-of-the-art generative model capable of producing high-fidelity images. Using prompts from the Stable Diffusion Prompt (SDP) dataset3, we generate 500 images encompassing both photorealistic and artistic styles. This diverse selection allows for a comprehensive evaluation of our attack on invisible watermarks. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions using 500 randomly selected images for evaluation. |
| Hardware Specification | Yes | All experiments are conducted on Nvidia A6000 GPUs. |
| Software Dependencies | No | The paper mentions specific models like Stable Diffusion-v2.1 and pre-trained image compression models from Compress AI library (Bmshj2018, Cheng2020) but does not provide specific version numbers for these software components or other libraries. |
| Experiment Setup | Yes | Watermark settings. We evaluate four publicly available pixel-level watermarking methods: Dwt Dct Svd [41], Riva GAN [67], Stega Stamp [53], and SSL watermark [15]. ... For watermark detection, we set the decision threshold to reject the null hypothesis with p < 0.01, requiring the detection of 23 out of 32 bits or 59 out of 96 bits, respectively, for the corresponding methods, as described in Section 2.3. ... Proposed attacks. For regeneration attacks using variational autoencoders, we evaluate two pre-trained image compression models from the Compress AI library [5]: Bmshj2018 [3] and Cheng2020 [7]. Compression factors are set to [1, 2, 3, 4, 5, 6], where lower factors correspond to more heavily degraded images. For diffusion model attacks, we use the Stable Diffusion-v2.1 model. The number of noise steps is set to [10, 30, 50, 100, 150, 200] (with σ =[0.10, 0.17, 0.23, 0.34, 0.46, 0.57]), and we employ pseudo numerical methods for diffusion models (PNDMs) [34] to generate samples. |