Watermark Stealing in Large Language Models
Authors: Nikola Jovanović, Robin Staab, Martin Vechev
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We are the first to propose an automated WS algorithm and use it in the first comprehensive study of spoofing and scrubbing in realistic settings. We show that for under $50 an attacker can both spoof and scrub state-of-the-art schemes previously considered safe, with average success rate of over 80%. |
| Researcher Affiliation | Academia | Nikola Jovanovi c 1 Robin Staab 1 Martin Vechev 1 1Department of Computer Science, ETH Zurich. |
| Pseudocode | No | The paper describes its algorithm in prose and diagrams (e.g., Fig. 2) but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We make all our code and additional examples available at https://watermark-stealing.org. |
| Open Datasets | Yes | To query LMmo the attacker uses the C4 dataset s Real News Like subset (Raffel et al., 2020), also used in most prior work. |
| Dataset Splits | No | The paper references datasets used for evaluation and data collection for watermark stealing (e.g., C4 dataset, Dolly-CW), but does not explicitly provide training, validation, and test dataset splits for model training. |
| Hardware Specification | No | The paper mentions various language models (e.g., LLAMA-7B, MISTRAL-7B, GEMMA-7B) used in experiments, but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running these experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | We obtain n = 30,000 responses of token length 800;... We empirically set the clipping parameter c = 2 in all experiments;... In all experiments, for the watermarking schemes we use the default γ = 0.25 and δ = 4. We further generally use parameters ρatt = 1.6 and δatt = 7.5, tuning them on separate data (using only LMatt) when necessary. |