Watermarking Makes Language Models Radioactive

Authors: Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the radioactivity of text generated by large language models (LLM), i.e., whether it is possible to detect that such synthetic input was used to train a subsequent LLM. Our new methods, specialized for radioactivity, detects with a provable confidence weak residuals of the watermark signal in the fine-tuned LLM.
Researcher Affiliation Collaboration Meta FAIR & École polytechnique Pierre Fernandez Meta FAIR & Inria Rennes Alain Durmus École polytechnique Matthijs Douze Meta FAIR Teddy Furon Inria Rennes
Pseudocode No The paper describes methods and procedures in narrative text and figures but does not contain a formally structured or labeled pseudocode or algorithm block.
Open Source Code Yes Radioactivity detection code is available at https://github.com/facebookresearch/radioactive-watermark
Open Datasets Yes The second fine-tuning is done with the setup presented in Sec. 5, with ρ=10% of watermarked data, and the second on OASST1 [Köpf et al., 2024].
Dataset Splits No The paper describes the datasets used for fine-tuning and evaluation benchmarks, but it does not specify explicit training/validation/test splits for its own generated instruction/answer pairs or the benchmarks in a way that allows reproduction of data partitioning for these specific experiments.
Hardware Specification Yes For our experiments, we utilized an internal cluster. ... on a single node equipped with 8 V100 GPUs. ... on a single V100 GPU.
Software Dependencies No The paper mentions optimizers and sampling methods, but it does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, or scikit-learn versions) required to replicate the experiment.
Experiment Setup Yes We use Adam W [Loshchilov and Hutter, 2017a] for 3000 steps, with a batch size of 8, a learning rate of 10 5 and a context size of 2048 tokens (which results in 3 training epochs). The learning rate follows a cosine annealing schedule [Loshchilov and Hutter, 2017b] with 100 warmup steps. ... logit bias δ = 3.0, proportion of greenlist tokens γ = 0.25, and k = 2. In both cases, we use nucleus sampling [Holtzman et al., 2019] with p = 0.95 and T = 0.8.