Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable Robust Watermarking for AI-Generated Text
Authors: Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, Yu-Xiang Wang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three varying LLMs and two datasets verify that our UNIGRAM-WATERMARK achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github. com/Xuandong Zhao/Unigram-Watermark. In this section, we aim to conduct experiments to evaluate watermark detection performance, watermarked text quality, and robustness against attacks compared to the baseline. Additional experiment results including different parameters, white-box attacks, scaled language models, etc. are deferred to Appendix B. |
| Researcher Affiliation | Academia | Xuandong Zhao Prabhanjan Ananth Lei Li Yu-Xiang Wang UC Santa Barbara EMAIL |
| Pseudocode | Yes | Pseudocodes of our approach Watermark and Detect are provided in Algorithm 1 and 2. |
| Open Source Code | Yes | Code is available at https://github. com/Xuandong Zhao/Unigram-Watermark. |
| Open Datasets | Yes | We utilize two long-form text datasets: Open Gen and LFQA. Open Gen, collected by Krishna et al. (2023), consists of 3K two-sentence chunks sampled from the validation split of Wiki Text-103 (Merity et al., 2017). LFQA is a long-form question-answering dataset created by Krishna et al. (2023) by scraping questions from Reddit, posted between July and December 2021, across six domains. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions using the "validation split of Wiki Text-103" for collecting prompts but does not specify how its own generated data was split for training, validation, or testing of their watermark detection model itself. |
| Hardware Specification | Yes | The experiments are conducted on Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions using "Huggingface library (Wolf et al., 2019)" but does not specify a version number for this or any other software dependency necessary for reproduction. It also mentions GPT3 (text-davinci-003) for perplexity evaluation but without versioning. |
| Experiment Setup | Yes | We use a watermark strength of δ = 2.0 and a green list ratio of γ = 0.5. We also use different watermark keys k for different models. Nucleus Sampling (Holtzman et al., 2020) is employed as the default decoding algorithm to introduce randomness while maintaining human-like text output. |