Provable Robust Watermarking for AI-Generated Text

Authors: Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, Yu-Xiang Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three varying LLMs and two datasets verify that our UNIGRAM-WATERMARK achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github. com/Xuandong Zhao/Unigram-Watermark. In this section, we aim to conduct experiments to evaluate watermark detection performance, watermarked text quality, and robustness against attacks compared to the baseline. Additional experiment results including different parameters, white-box attacks, scaled language models, etc. are deferred to Appendix B.
Researcher Affiliation Academia Xuandong Zhao Prabhanjan Ananth Lei Li Yu-Xiang Wang UC Santa Barbara {xuandongzhao,prabhanjan,leili,yuxiangw}@cs.ucsb.edu
Pseudocode Yes Pseudocodes of our approach Watermark and Detect are provided in Algorithm 1 and 2.
Open Source Code Yes Code is available at https://github. com/Xuandong Zhao/Unigram-Watermark.
Open Datasets Yes We utilize two long-form text datasets: Open Gen and LFQA. Open Gen, collected by Krishna et al. (2023), consists of 3K two-sentence chunks sampled from the validation split of Wiki Text-103 (Merity et al., 2017). LFQA is a long-form question-answering dataset created by Krishna et al. (2023) by scraping questions from Reddit, posted between July and December 2021, across six domains.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It mentions using the "validation split of Wiki Text-103" for collecting prompts but does not specify how its own generated data was split for training, validation, or testing of their watermark detection model itself.
Hardware Specification Yes The experiments are conducted on Nvidia A100 GPUs.
Software Dependencies No The paper mentions using "Huggingface library (Wolf et al., 2019)" but does not specify a version number for this or any other software dependency necessary for reproduction. It also mentions GPT3 (text-davinci-003) for perplexity evaluation but without versioning.
Experiment Setup Yes We use a watermark strength of δ = 2.0 and a green list ratio of γ = 0.5. We also use different watermark keys k for different models. Nucleus Sampling (Holtzman et al., 2020) is employed as the default decoding algorithm to introduce randomness while maintaining human-like text output.