Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models

Authors: Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, Boaz Barak

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the feasibility of our attack by instantiating it to attack three existing watermarking schemes for large language models: Kirchenbauer et al. (2023a), Kuditipudi et al. (2023), and Zhao et al. (2023a), and include preliminary results on vision-language models. The same attack schema successfully removes the watermarks planted by all schemes, with only minor quality degradation.1
Researcher Affiliation Academia 1Harvard University 2George Mason University 3Sapienza University of Rome.
Pseudocode Yes Algorithm 1 Pseudocode for our attack
Open Source Code Yes Experiments were all performed on 40 Gi B A100s and our code is available at https://github.com/hlzhang109/ impossibility-watermark.
Open Datasets Yes All three of the watermarking schemes we attack were originally tested on the Real News subset of the C4 dataset (Raffel et al., 2020), so we use this as our primary task as well.
Dataset Splits No The paper mentions using a C4 dataset and training models, but it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit predefined splits for their experiments).
Hardware Specification Yes Experiments were all performed on 40 Gi B A100s
Software Dependencies No The paper mentions software like Llama2-7B, T5-XL v1.1, and RoBERTa-v3 large, but it does not specify version numbers for general software dependencies or libraries (e.g., PyTorch version, TensorFlow version).
Experiment Setup Yes Specifically, we give Llama-2-7B (Touvron et al., 2023) the task of generating a completion given the first 20 tokens of a news article. Except where otherwise noted, we default to a generation length of 200 tokens. ... Table 4: Default hyperparameters of our attack for LM watermarks. KGW EXP Unigram Attack steps 200 300 300 Secret key 15485863 42 0 z stopping threshold 1.645 Max watermarked length {200, 512} top-p of P 0.95 Span length {4, 6, 8} Num of spans l 1 Min infill length l Max infill length 1.5 l