Watermarking Makes Language Models Radioactive
Authors: Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the radioactivity of text generated by large language models (LLM), i.e., whether it is possible to detect that such synthetic input was used to train a subsequent LLM. Our new methods, specialized for radioactivity, detects with a provable confidence weak residuals of the watermark signal in the fine-tuned LLM. |
| Researcher Affiliation | Collaboration | Meta FAIR & École polytechnique Pierre Fernandez Meta FAIR & Inria Rennes Alain Durmus École polytechnique Matthijs Douze Meta FAIR Teddy Furon Inria Rennes |
| Pseudocode | No | The paper describes methods and procedures in narrative text and figures but does not contain a formally structured or labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Radioactivity detection code is available at https://github.com/facebookresearch/radioactive-watermark |
| Open Datasets | Yes | The second fine-tuning is done with the setup presented in Sec. 5, with ρ=10% of watermarked data, and the second on OASST1 [Köpf et al., 2024]. |
| Dataset Splits | No | The paper describes the datasets used for fine-tuning and evaluation benchmarks, but it does not specify explicit training/validation/test splits for its own generated instruction/answer pairs or the benchmarks in a way that allows reproduction of data partitioning for these specific experiments. |
| Hardware Specification | Yes | For our experiments, we utilized an internal cluster. ... on a single node equipped with 8 V100 GPUs. ... on a single V100 GPU. |
| Software Dependencies | No | The paper mentions optimizers and sampling methods, but it does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, or scikit-learn versions) required to replicate the experiment. |
| Experiment Setup | Yes | We use Adam W [Loshchilov and Hutter, 2017a] for 3000 steps, with a batch size of 8, a learning rate of 10 5 and a context size of 2048 tokens (which results in 3 training epochs). The learning rate follows a cosine annealing schedule [Loshchilov and Hutter, 2017b] with 100 warmup steps. ... logit bias δ = 3.0, proportion of greenlist tokens γ = 0.25, and k = 2. In both cases, we use nucleus sampling [Holtzman et al., 2019] with p = 0.95 and T = 0.8. |