Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection
Authors: Yiming Li, Yang Bai, Yong Jiang, Yong Yang, Shu-Tao Xia, Bo Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses. |
| Researcher Affiliation | Collaboration | 1Tsinghua Shenzhen International Graduate School, Tsinghua University, China; 2Tencent Security Zhuque Lab, China; 3Tencent Security Platform Department, China; 4The Department of Computer Science, University of Illinois at Urbana-Champaign, USA |
| Pseudocode | No | The paper describes the methods (UBW-P, UBW-C) with mathematical formulations and prose, but it does not contain a distinct, structured pseudocode or algorithm block that is clearly labeled as such. |
| Open Source Code | Yes | Our codes are available at https://github.com/THUYiming Li/Untargeted_Backdoor_Watermark. |
| Open Datasets | Yes | In this paper, we conduct experiments on two classical benchmark datasets, including CIFAR-10 [1] and (a subset of) Image Net [2], with Res Net-18 [46]. |
| Dataset Splits | No | The paper specifies a training set of 25,000 images and a test set of 2,500 images for ImageNet, but it does not mention a separate validation split or how one would be derived for reproducibility. It states: "Specifically, we randomly select a subset containing 50 classes with 25, 000 images from the original Image Net for training (500 images per class) and 2, 500 images for testing (50 images per class)." |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or memory specifications. Mentions of funding from "Tencent Rhino-Bird Research Program, and the C3 AI and Amazon research awards" are not hardware specifications. |
| Software Dependencies | No | The paper mentions models like ResNet-18 and optimization methods like SGD but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x) necessary for replication. |
| Experiment Setup | Yes | We set the poisoning rate γ = 0.1 for all watermarks on both datasets. In particular, since the label-consistent attack can only modify samples from the target class, its poisoning rate is set to its maximum (i.e., 0.02) on the Image Net dataset. The target label yt is set to 1 for all targeted watermarks. We set λ = 2 for UBW-C on both datasets. We set = 0.25 for the hypothesis-test in all cases. |