On the Reliability of Watermarks for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer handwritten document. We find that watermarks remain detectable even after human and machine paraphrasing.Experimental Setup: Following Kirchenbauer et al. (2023), we use the Colossal Common Crawl Cleaned corpus (C4) dataset as a source of prompts for open-ended generation.
Researcher Affiliation Academia 1 University of Maryland 2 ELLIS Institute T ubingen, 3 Max-Planck Institute for Intelligent Systems, T ubingen AI Center 4 Scuola Normale Superiore di Pisa, 5 New York University
Pseudocode Yes Algorithm 1 Generalized Self Hash Watermark
Open Source Code Yes The implementation is designed to be both useable and extensible for further research and can be found at github.com/jwkirchenbauer/lm-watermarking.
Open Datasets Yes Following Kirchenbauer et al. (2023), we use the Colossal Common Crawl Cleaned corpus (C4) dataset as a source of prompts for open-ended generation. For the human study, we adopt the Long-Form Question Answering (LFQA) dataset curated by Krishna et al. (2023) based on a selection of posts and responses to question s on Reddit s Explain Like I m Five (ELI5) forum.The Github , Law , Med , and Patents markers refer to the github, free law, pubmed, and uspto subsets of the The Pile (Gao et al., 2020) dataset as hosted on the huggingface hub at huggingface.co/datasets/Eleuther AI/pile by Eleuther AI. Wiki indicates samples from the training split of Wikitext103 dataset (Merity et al., 2016), also hosted on the huggingface hub at huggingface.co/datasets/wikitext as wikitext-103-raw-v1.
Dataset Splits No The paper does not explicitly mention training, validation, and test dataset splits with percentages or sample counts. While it discusses data used for experiments and evaluation, it does not specify a separate validation split or its size.
Hardware Specification Yes The experiments performed in the study were all inference-based and therefore could be run on a single Nvidia RTXA4/5/6000 GPU. Additionally, the Dipper model and Detect GPT model were both run on A6000 cards due to the memory footprint required by their larger parameter counts.
Software Dependencies No The paper mentions using 'open source software' and specific libraries like 'huggingface datasets library' and 'huggingface transformers modelling framework' but does not provide specific version numbers for any of these software components.
Experiment Setup Yes We use a single set of language model sampling parameters across all experiments, multinomial sampling at temperature 0.7, and for all experiments, unless explicitly stated, we use the Left Hash watermark scheme based on an additive PRF with context window h = 1 and (γ, δ) = (0.25, 2.0).