Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
Authors: Huanming Shen, Baizhou Huang, Xiaojun Wan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments demonstrate that SEEK outperforms KGW-family baselines, achieving substantial improvements in scrubbing and spoofing robustness across datasets, establishing a new Pareto frontier, yielding spoofing robustness gains of +88.2%/+92.3%/+82.0% on Dolly-CW/MMWBook Reports/MMW-Fake News and scrubbing robustness gains of +10.2%/+13.4%/+8.6% on Wiki Text/C4/LFQA compared to KGW-Min with 4-gram watermark window. |
| Researcher Affiliation | Academia | Huanming Shen1,2 EMAIL Baizhou Huang1 EMAIL Xiaojun Wan1 EMAIL 1Wangxuan Institute of Computer Technology, Peking University 2University of Electronic Science and Technology of China |
| Pseudocode | Yes | Algorithm 1 Generation Algorithm 1: Input: prompt {x1, ..., x N}; integer hash function H with hash space {1, 2, ..., d}; secret key ξ; watermark strength δ; watermark window size h; large language model PM. 2: for n = N + 1, N + 2, . . . do 3: Apply the language model to compute next logit vector ℓt PM(x1:n 1). 4: Calculate the hash signature of the watermark window I {H(xn k)|1 k h}. 5: Partition the vocabulary V into uniform sub-vocabularies {v1, . . . , vd} by ξ. 6: Derive the cipher θi n for each sub-vocabulary vi following Eq 4. 7: Generate a sub-green list Gi for each sub-vocabulary vi seeded by the corresponding θi n. 8: Union all sub-green lists Gi as the green list G of the full vocabulary. 9: Add δ to the logits of tokens in G to modify the distribution by Eq 1. 10: Sample the next token from the modified distribution. 11: end for |
| Open Source Code | No | We provide a detailed explanation of the parameter usage and will release the source code to ensure reproducibility. |
| Open Datasets | Yes | We use C4 [51], Wiki Text [41], and LFQA datasets [31] to assess watermark robustness against scrubbing. For the spoofing attack, we use C4-Eval [51], Dolly-CW [10], MMW-Book Reports, and MMW-Fake News [49]. |
| Dataset Splits | Yes | all experimental results presented in the figures and tables are based on the same 500 positive samples and 500 negative samples. For the spoofing learning, the attacker generates original queries prompt using the C4-Real News Like subset, obtaining no fewer than n=30,000 responses, each with a maximum token length of 800. |
| Hardware Specification | Yes | All experiments are conducted on Nvidia A40 GPUs. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | Unless otherwise specified, the watermarking schemes in our experiments adopt the hyperparameter settings commonly used in prior work [29, 30, 27], (γ = 0.25, δ = 5) , and a maximum generation length of 150 new tokens. In the spoofing attack experiments, all the watermark detectors are calibrated on the C4-Real News Like dataset using 2,000 watermarked and non-watermarked texts. ... The spoofing model is configured with a spoofer strength of 8.25, and a weighted loss objective defined by wabcd = 2.0, wpartials = 1.0, wempty = 0.5. ... Unless otherwise specified, SEEK in this paper uses the hyperparameters d=6 and h=6. |