Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Robust Distortion-free Watermarks for Language Models
Authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these watermarks to three language models OPT-1.3B, LLa MA-7B and Alpaca-7B to experimentally validate their statistical power and robustness to various paraphrasing attacks. We experimentally validate the statistical power of our watermarking strategies (i.e., ITS, ITS-edit, EXP, and EXP-edit) via experiments with the OPT-1.3B (Zhang et al., 2022) and LLa MA-7B (Touvron et al., 2023) models. |
| Researcher Affiliation | Academia | Rohith Kuditipudi EMAIL Department of Computer Science Stanford University John Thickstun EMAIL Department of Computer Science Stanford University Tatsunori Hashimoto EMAIL Department of Computer Science Stanford University Percy Liang EMAIL Department of Computer Science Stanford University |
| Pseudocode | Yes | Algorithm 1: Watermarked text generation (generate) Algorithm 2: Watermarked text detection (detect) Algorithm 3: Test statistic (ϕ) Algorithm 4: Randomized watermarked text generation (shift-generate) |
| Open Source Code | Yes | We release all code publicly at https://github.com/jthickstun/watermark. |
| Open Datasets | Yes | We run experiments using generate rather than shift-generate, mainly for the sake of reproducibility; recall however that this choice has no impact on the p-values we report. We test for all watermarks using a block size k (in Algorithm 3) equal to the length m of the text. Following the methodology of Kirchenbauer et al. (2023), we generate watermarked text continuations of prompts sampled from the news-like subset of the C4 dataset (Raffel et al., 2020). Finally, using the Alpaca-7B model and evaluation dataset Taori et al. (2023), we conduct a case-study on the feasibility of watermarking the responses of a performant instruction following model. |
| Dataset Splits | No | The paper mentions using |
| Hardware Specification | Yes | We report average runtimes and the associated standard deviations across 5 calls on an Apple M2 Macbook Pro Laptop. We include benchmarking scripts with our code release. |
| Software Dependencies | No | The paper references specific models used in the experiments such as OPT-1.3B (Zhang et al., 2022), LLa MA-7B (Touvron et al., 2023), and Alpaca-7B (Taori et al., 2023), as well as the OPUS-MT collection of translation models with specific Huggingface Hub identifiers. However, it does not provide specific version numbers for general software dependencies like programming languages (e.g., Python) or deep learning frameworks (e.g., PyTorch, TensorFlow) used to implement their methodology. |
| Experiment Setup | Yes | We test for all watermarks using a block size k (in Algorithm 3) equal to the length m of the text. Following the methodology of Kirchenbauer et al. (2023), we generate watermarked text continuations of prompts sampled from the news-like subset of the C4 dataset (Raffel et al., 2020). We vary the generation length m (Experiment 1) and the random number sequence length n (Experiment 2), and we report median p-values of watermarked text over 500 samples. In all our experiments except for Experiment 2, where the control variable n is a hyperparameter that is unique to our watermarks we also replicate the watermark of Kirchenbauer et al. (2023) as a baseline, setting the greenlist fraction γ = 0.25 and varying the logit bias δ {1.0, 2.0}. |