Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand
Authors: Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu, Bo Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three benchmark datasets are conducted, which verify the effectiveness of our method and its resistance to potential adaptive methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3School of Cyber Science and Technology, Zhejiang University 4Department of Computer Science, Northwestern University 5Tsinghua Shenzhen International Graduate School, Tsinghua University 6Department of Electronic and Computer Engineering, UC Riverside 7Department of Computer Science, University of Illinois Urbana-Champaign 8Department of Computer Science, University of Chicago |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | The code for reproducing main experiments is available at https://github.com/Junfeng Go/Domain-Watermark. (Abstract) and In particular, we also release our training codes at https://github.com/Junfeng Go/Domain-Watermark. (Section L) |
| Open Datasets | Yes | Extensive experiments on three benchmark datasets are conducted (Abstract) and In this section, we conduct experiments on CIFAR-10 [1] and Tiny-Image Net [42] with VGG [43] and Res Net [44], respectively. Results on STL-10 [45] are in Appendix F. (Section 5) |
| Dataset Splits | Yes | CIFAR-10. CIFAR-10 dataset contains 10 labels, 50,000 training samples, and 10,000 validation samples. (Appendix E.1) and STL-10. STL-10 dataset contains 10 labels and 13,000 labeled samples and 100,000 unlabeled samples. We divide the labeled samples into the training and validation dataset with a ratio of 8 : 2. (Appendix E.1) |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) were mentioned for running experiments. |
| Software Dependencies | No | We implement all baseline methods based on Backdoor Box [46]. (Section 5.1). No specific version numbers for software libraries or dependencies are provided. |
| Experiment Setup | Yes | In the experiments, we train each model with 150 epochs with an initialized learning rate of 0.1. Following previous work [29, 62], we schedule learning rate drops at epochs 14, 24, and 35 by a factor of 0.1. For all models, we employ SGD with Nesterov momentum, and we set the momentum coefficient to 0.9. We use batches of 128 images and weight decay with a coefficient of 4 10 4. (Appendix E.3) and we set the watermarking rate γ = 0.1, perturbation constraint ϵ = 16/255 in all cases (Section 5.1). |