Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Provable Watermarking for Data Poisoning Attacks

Authors: Yifan Zhu, Lijia Yu, Xiao-Shan Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical findings through experiments on several attacks, models, and datasets.
Researcher Affiliation Academia 1State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Institute of AI for Industries, Nanjing, China EMAIL, EMAIL, EMAIL
Pseudocode Yes C Watermarking Algorithm Algorithm 1 Post-Poisoning Watermarking Input: The poisoned training dataset DP = {(xi + δp i , yi)}N i=1. The key ζ. Output: Watermarked training dataset DW = {(xi + δp i + δw, yi)}N i=1. Choose the watermarking dimension W. Set δw = ϵw sign(ζ)|W. Algorithm 2 Poisoning-Concurrent Watermarking Input: The training dataset DP = {(xi, yi)}N i=1. The key ζ. Output: Watermarked poisoned training dataset DW = {(xi + δp i + δw, yi)}N i=1. Choose the watermarking dimension W. Set δw = ϵw sign(ζ)|W. Update poisons δp i on poisoning dimension P = [d] W. Algorithm 3 Detection Input: The suspect training data x. The key ζ. The detection threshold τ. Output: 1 (Positive) or 0 (Negative). Compute the detection value v = ζT x If v > τ, return 1, else v τ, return 0.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have provided our codes in the supplemental material.
Open Datasets Yes We evaluate on CIFAR-10, CIFAR-100 [45], and Tiny-Image Net dataset [48]. Table 5: The accuracy (Acc, %), ASR and AUROC of SST-2 dataset on BERT-base model [24] with different watermarking length q.
Dataset Splits Yes We evaluate on CIFAR-10, CIFAR-100 [45], and Tiny-Image Net dataset [48].
Hardware Specification Yes All experiments are evaluated on a single NVIDIA A800 80GB PCIe GPU.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) used for the experiments. It only mentions model architectures like ResNet-18, VGG-19, etc., and BERT-base model.
Experiment Setup Yes The watermarking and poisoning budgets are set to 16/255 for backdoor attacks, and 8/255 for availability attacks. For victim model training, the total epochs are 200, initial learning rate is 0.5 with a cosine scheduler, the momentum and weight decay are 0.9 and 10 4 respectively.