Provable Robust Watermarking for AI-Generated Text
Authors: Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, Yu-Xiang Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three varying LLMs and two datasets verify that our UNIGRAM-WATERMARK achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github. com/Xuandong Zhao/Unigram-Watermark. In this section, we aim to conduct experiments to evaluate watermark detection performance, watermarked text quality, and robustness against attacks compared to the baseline. Additional experiment results including different parameters, white-box attacks, scaled language models, etc. are deferred to Appendix B. |
| Researcher Affiliation | Academia | Xuandong Zhao Prabhanjan Ananth Lei Li Yu-Xiang Wang UC Santa Barbara {xuandongzhao,prabhanjan,leili,yuxiangw}@cs.ucsb.edu |
| Pseudocode | Yes | Pseudocodes of our approach Watermark and Detect are provided in Algorithm 1 and 2. |
| Open Source Code | Yes | Code is available at https://github. com/Xuandong Zhao/Unigram-Watermark. |
| Open Datasets | Yes | We utilize two long-form text datasets: Open Gen and LFQA. Open Gen, collected by Krishna et al. (2023), consists of 3K two-sentence chunks sampled from the validation split of Wiki Text-103 (Merity et al., 2017). LFQA is a long-form question-answering dataset created by Krishna et al. (2023) by scraping questions from Reddit, posted between July and December 2021, across six domains. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions using the "validation split of Wiki Text-103" for collecting prompts but does not specify how its own generated data was split for training, validation, or testing of their watermark detection model itself. |
| Hardware Specification | Yes | The experiments are conducted on Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions using "Huggingface library (Wolf et al., 2019)" but does not specify a version number for this or any other software dependency necessary for reproduction. It also mentions GPT3 (text-davinci-003) for perplexity evaluation but without versioning. |
| Experiment Setup | Yes | We use a watermark strength of δ = 2.0 and a green list ratio of γ = 0.5. We also use different watermark keys k for different models. Nucleus Sampling (Holtzman et al., 2020) is employed as the default decoding algorithm to introduce randomness while maintaining human-like text output. |