Leveraging Optimization for Adaptive Attacks on Image Watermarks

Authors: Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, Florian Kerschbaum

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at no visible degradation in image quality. Optimizing our attacks is efficient and requires less than 1 GPU hour to reduce the detection accuracy to 6.3% or less. Our findings emphasize the need for more rigorous robustness testing against adaptive, learnable attackers.
Researcher Affiliation Academia Nils Lukas, Abdulrahman Diaa , Lucas Fenaux , Florian Kerschbaum University of Waterloo, Canada {nlukas,abdulrahman.diaa,lucas.fenaux, florian.kerschbaum}@uwaterloo.ca
Pseudocode Yes Algorithm 1 GKEYGEN: A Simple Method to Generate Differentiable Keys; Algorithm 2 Adversarial Noising; Algorithm 3 Adversarial Compression
Open Source Code No We make it harder by not releasing our code publicly. We will, however, release our code, including pre-trained checkpoints, upon carefully considering each request.
Open Datasets Yes We generate 1k images to evaluate TPR@1%FPR and 5k images to evaluate FID and CLIP score on the training dataset of MS-COCO-2017 (Lin et al., 2014).
Dataset Splits No The paper does not explicitly describe a validation dataset split used for hyperparameter tuning or early stopping of their own models.
Hardware Specification Yes All experiments were conducted on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software like Stable Diffusion models (v1.1, v2.0), but does not provide specific version numbers for the programming languages or libraries (e.g., Python, PyTorch, TensorFlow) used in their own implementation of the attacks or key generation.
Experiment Setup Yes Table 1 summarizes the best attacks from Figure 2 when we set the lowest acceptable detection accuracy to 10%. When multiple attacks achieve a detection accuracy lower than 10%, we pick the attack with the lowest perceptual distance to the watermarked image. We observe that adversarial compression is an effective attack against all watermarking methods. TRW is also evaded by adversarial compression, but adversarial noising at ϵ = 2/255 preserves a higher image quality. Table 1: A summary of Pareto optimal attacks with detection accuracies less than 10%. We list the attack s name and parameters, the perceptual distance before and after evasion, and the accuracy (TPR@1%FPR). ϵ is the maximal perturbation in the L norm and r is the number of compressions.