Cosine Model Watermarking against Ensemble Distillation
Authors: Laurent Charette, Lingyang Chu, Yizhou Chen, Jian Pei, Lanjun Wang, Yong Zhang9512-9520
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on public data sets demonstrate the excellent performance of Cos WM and its advantages over the state-of-the-art baselines. In this section, we evaluate the performance of Cos WM on the model watermarking task. We first describe the settings and data sets. Then we present a case study to demonstrate the working process of Cos WM. We compare the performance of all the methods in two scenarios. |
| Researcher Affiliation | Collaboration | 1 Huawei Technologies Canada, Burnaby, Canada, 2 Mc Master University, Hamilton, Canada, 3 Simon Fraser University, Burnaby, Canada, 4 Tianjin University, Tianjin, China |
| Pseudocode | Yes | Algorithm 1: Extracting signal in a model |
| Open Source Code | Yes | Code for this case study is available online https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=2d937a91-1692-4f88-94ca-82e1ae8d4d79 |
| Open Datasets | Yes | We conduct all the experiments using two public data sets, FMNIST (Xiao, Rasul, and Vollgraf 2017), and CIFAR10 (Krizhevsky 2009). |
| Dataset Splits | Yes | We partition all the training examples randomly into two halves, with use one half for training the teacher models and the other half for distilling the student models. |
| Hardware Specification | Yes | All the experiments are conducted on Dell Alienware with Intel(R) Core(TM) i99980XE CPU, 128G memory, NVIDIA 1080Ti, and Ubuntu 16.04. |
| Software Dependencies | Yes | We implement Cos WM and replicate DAWN in Py Torch 1.3. The Fingerprinting code is provided by the authors of the corresponding paper (Lukas, Zhang, and Kerschbaum 2019) and is implemented in Keras using a Tensor Flow v2.1 backend. |
| Experiment Setup | Yes | All models are trained or distilled for 100 epochs to guarantee convergence. The models with the best testing accuracy during training/distillation are retained. To train the watermarked teacher model, we set the signal amplitude ε = 0.05 and the watermark key K = (fw, i , v0) with fw = 30.0, i = 0 and v0 a unit random vector. |