On the Learnability of Watermarks for Language Models

Authors: Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach on three decoding-based watermarking strategies and various hyperparameter settings, finding that models can learn to generate watermarked text with high detectability.
Researcher Affiliation Academia Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto Stanford University {cygu, xlisali, thashim}@stanford.edu, pliang@cs.stanford.edu
Pseudocode No The paper describes its methods (logit-based and sampling-based watermark distillation) in prose and with mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes In addition, we release code and scripts to reproduce experiments at https://github. com/chenchenygu/watermark-learnability, along with trained model weights.
Open Datasets Yes For logit-based watermark distillation, we use Llama 2 7B (Touvron et al., 2023) as both the teacher and student models... We distill using a subset of Open Web Text (Gokaslan et al., 2019) for 5,000 steps... We evaluate on generations prompted by prefixes from the Real News Like subset of the C4 dataset (Raffel et al., 2020).
Dataset Splits Yes We distill using a subset of Open Web Text (Gokaslan et al., 2019) for 5,000 steps with a batch size of 64 sequences, sequence length of 512 tokens,8 maximal learning rate of 1e-5, and cosine learning rate decay with a linear warmup. ... We evaluate on generations prompted by prefixes from the Real News Like subset of the C4 dataset (Raffel et al., 2020). For each decoding-based watermarking strategy and distilled model, we generate 5,000 200-token completions from 50-token prompts from the validation split.
Hardware Specification Yes Each training run took approximately 6 hours on 4 NVIDIA A100 80GB GPUs. (Appendix E.1) ... Each training run took approximately 3 hours on 1 NVIDIA A100 80GB GPU. (Appendix F)
Software Dependencies No The paper mentions models (e.g., 'Llama 2 7B', 'Pythia 1.4B') and optimizers ('Adam W optimizer') but does not specify version numbers for key software libraries or dependencies (e.g., 'PyTorch 1.x', 'Hugging Face Transformers 4.x').
Experiment Setup Yes We distill using a subset of Open Web Text (Gokaslan et al., 2019) for 5,000 steps with a batch size of 64 sequences, sequence length of 512 tokens,8 maximal learning rate of 1e-5, and cosine learning rate decay with a linear warmup.