Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Distortion-free Watermarks for Language Models

Authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these watermarks to three language models OPT-1.3B, LLa MA-7B and Alpaca-7B to experimentally validate their statistical power and robustness to various paraphrasing attacks. We experimentally validate the statistical power of our watermarking strategies (i.e., ITS, ITS-edit, EXP, and EXP-edit) via experiments with the OPT-1.3B (Zhang et al., 2022) and LLa MA-7B (Touvron et al., 2023) models.
Researcher Affiliation Academia Rohith Kuditipudi EMAIL Department of Computer Science Stanford University John Thickstun EMAIL Department of Computer Science Stanford University Tatsunori Hashimoto EMAIL Department of Computer Science Stanford University Percy Liang EMAIL Department of Computer Science Stanford University
Pseudocode Yes Algorithm 1: Watermarked text generation (generate) Algorithm 2: Watermarked text detection (detect) Algorithm 3: Test statistic (ϕ) Algorithm 4: Randomized watermarked text generation (shift-generate)
Open Source Code Yes We release all code publicly at https://github.com/jthickstun/watermark.
Open Datasets Yes We run experiments using generate rather than shift-generate, mainly for the sake of reproducibility; recall however that this choice has no impact on the p-values we report. We test for all watermarks using a block size k (in Algorithm 3) equal to the length m of the text. Following the methodology of Kirchenbauer et al. (2023), we generate watermarked text continuations of prompts sampled from the news-like subset of the C4 dataset (Raffel et al., 2020). Finally, using the Alpaca-7B model and evaluation dataset Taori et al. (2023), we conduct a case-study on the feasibility of watermarking the responses of a performant instruction following model.
Dataset Splits No The paper mentions using
Hardware Specification Yes We report average runtimes and the associated standard deviations across 5 calls on an Apple M2 Macbook Pro Laptop. We include benchmarking scripts with our code release.
Software Dependencies No The paper references specific models used in the experiments such as OPT-1.3B (Zhang et al., 2022), LLa MA-7B (Touvron et al., 2023), and Alpaca-7B (Taori et al., 2023), as well as the OPUS-MT collection of translation models with specific Huggingface Hub identifiers. However, it does not provide specific version numbers for general software dependencies like programming languages (e.g., Python) or deep learning frameworks (e.g., PyTorch, TensorFlow) used to implement their methodology.
Experiment Setup Yes We test for all watermarks using a block size k (in Algorithm 3) equal to the length m of the text. Following the methodology of Kirchenbauer et al. (2023), we generate watermarked text continuations of prompts sampled from the news-like subset of the C4 dataset (Raffel et al., 2020). We vary the generation length m (Experiment 1) and the random number sequence length n (Experiment 2), and we report median p-values of watermarked text over 500 samples. In all our experiments except for Experiment 2, where the control variable n is a hyperparameter that is unique to our watermarks we also replicate the watermark of Kirchenbauer et al. (2023) as a baseline, setting the greenlist fraction γ = 0.25 and varying the logit bias δ {1.0, 2.0}.