Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models
Authors: Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, Boaz Barak
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the feasibility of our attack by instantiating it to attack three existing watermarking schemes for large language models: Kirchenbauer et al. (2023a), Kuditipudi et al. (2023), and Zhao et al. (2023a), and include preliminary results on vision-language models. The same attack schema successfully removes the watermarks planted by all schemes, with only minor quality degradation.1 |
| Researcher Affiliation | Academia | 1Harvard University 2George Mason University 3Sapienza University of Rome. |
| Pseudocode | Yes | Algorithm 1 Pseudocode for our attack |
| Open Source Code | Yes | Experiments were all performed on 40 Gi B A100s and our code is available at https://github.com/hlzhang109/ impossibility-watermark. |
| Open Datasets | Yes | All three of the watermarking schemes we attack were originally tested on the Real News subset of the C4 dataset (Raffel et al., 2020), so we use this as our primary task as well. |
| Dataset Splits | No | The paper mentions using a C4 dataset and training models, but it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit predefined splits for their experiments). |
| Hardware Specification | Yes | Experiments were all performed on 40 Gi B A100s |
| Software Dependencies | No | The paper mentions software like Llama2-7B, T5-XL v1.1, and RoBERTa-v3 large, but it does not specify version numbers for general software dependencies or libraries (e.g., PyTorch version, TensorFlow version). |
| Experiment Setup | Yes | Specifically, we give Llama-2-7B (Touvron et al., 2023) the task of generating a completion given the first 20 tokens of a news article. Except where otherwise noted, we default to a generation length of 200 tokens. ... Table 4: Default hyperparameters of our attack for LM watermarks. KGW EXP Unigram Attack steps 200 300 300 Secret key 15485863 42 0 z stopping threshold 1.645 Max watermarked length {200, 512} top-p of P 0.95 Span length {4, 6, 8} Num of spans l 1 Min infill length l Max infill length 1.5 l |