Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand

Authors: Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu, Bo Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three benchmark datasets are conducted, which verify the effectiveness of our method and its resistance to potential adaptive methods.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3School of Cyber Science and Technology, Zhejiang University 4Department of Computer Science, Northwestern University 5Tsinghua Shenzhen International Graduate School, Tsinghua University 6Department of Electronic and Computer Engineering, UC Riverside 7Department of Computer Science, University of Illinois Urbana-Champaign 8Department of Computer Science, University of Chicago
Pseudocode No No explicit pseudocode or algorithm blocks were found.
Open Source Code Yes The code for reproducing main experiments is available at https://github.com/Junfeng Go/Domain-Watermark. (Abstract) and In particular, we also release our training codes at https://github.com/Junfeng Go/Domain-Watermark. (Section L)
Open Datasets Yes Extensive experiments on three benchmark datasets are conducted (Abstract) and In this section, we conduct experiments on CIFAR-10 [1] and Tiny-Image Net [42] with VGG [43] and Res Net [44], respectively. Results on STL-10 [45] are in Appendix F. (Section 5)
Dataset Splits Yes CIFAR-10. CIFAR-10 dataset contains 10 labels, 50,000 training samples, and 10,000 validation samples. (Appendix E.1) and STL-10. STL-10 dataset contains 10 labels and 13,000 labeled samples and 100,000 unlabeled samples. We divide the labeled samples into the training and validation dataset with a ratio of 8 : 2. (Appendix E.1)
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) were mentioned for running experiments.
Software Dependencies No We implement all baseline methods based on Backdoor Box [46]. (Section 5.1). No specific version numbers for software libraries or dependencies are provided.
Experiment Setup Yes In the experiments, we train each model with 150 epochs with an initialized learning rate of 0.1. Following previous work [29, 62], we schedule learning rate drops at epochs 14, 24, and 35 by a factor of 0.1. For all models, we employ SGD with Nesterov momentum, and we set the momentum coefficient to 0.9. We use batches of 128 images and weight decay with a coefficient of 4 10 4. (Appendix E.3) and we set the watermarking rate γ = 0.1, perturbation constraint ϵ = 16/255 in all cases (Section 5.1).