On the Learnability of Watermarks for Language Models
Authors: Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our approach on three decoding-based watermarking strategies and various hyperparameter settings, finding that models can learn to generate watermarked text with high detectability. |
| Researcher Affiliation | Academia | Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto Stanford University {cygu, xlisali, thashim}@stanford.edu, pliang@cs.stanford.edu |
| Pseudocode | No | The paper describes its methods (logit-based and sampling-based watermark distillation) in prose and with mathematical equations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | In addition, we release code and scripts to reproduce experiments at https://github. com/chenchenygu/watermark-learnability, along with trained model weights. |
| Open Datasets | Yes | For logit-based watermark distillation, we use Llama 2 7B (Touvron et al., 2023) as both the teacher and student models... We distill using a subset of Open Web Text (Gokaslan et al., 2019) for 5,000 steps... We evaluate on generations prompted by prefixes from the Real News Like subset of the C4 dataset (Raffel et al., 2020). |
| Dataset Splits | Yes | We distill using a subset of Open Web Text (Gokaslan et al., 2019) for 5,000 steps with a batch size of 64 sequences, sequence length of 512 tokens,8 maximal learning rate of 1e-5, and cosine learning rate decay with a linear warmup. ... We evaluate on generations prompted by prefixes from the Real News Like subset of the C4 dataset (Raffel et al., 2020). For each decoding-based watermarking strategy and distilled model, we generate 5,000 200-token completions from 50-token prompts from the validation split. |
| Hardware Specification | Yes | Each training run took approximately 6 hours on 4 NVIDIA A100 80GB GPUs. (Appendix E.1) ... Each training run took approximately 3 hours on 1 NVIDIA A100 80GB GPU. (Appendix F) |
| Software Dependencies | No | The paper mentions models (e.g., 'Llama 2 7B', 'Pythia 1.4B') and optimizers ('Adam W optimizer') but does not specify version numbers for key software libraries or dependencies (e.g., 'PyTorch 1.x', 'Hugging Face Transformers 4.x'). |
| Experiment Setup | Yes | We distill using a subset of Open Web Text (Gokaslan et al., 2019) for 5,000 steps with a batch size of 64 sequences, sequence length of 512 tokens,8 maximal learning rate of 1e-5, and cosine learning rate decay with a linear warmup. |