Defense against Model Extraction Attack by Bayesian Active Watermarking
Authors: Zhenyi Wang, Yihan Wu, Heng Huang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically conduct extensive experiments across various model extraction settings and datasets to protect the victim model, which is trained using either supervised learning or self-supervised learning. The outcomes reveal that, in contrast to the SOTA defensive training method (Wang et al., 2023), our approach necessitates only minimal finetuning of the victim model, resulting in a noteworthy reduction in re-training costs by 87%. Additionally, it achieves 17–172 speed up compared to (Orekondy et al., 2020; Mazeika et al., 2022) during inference. Furthermore, our approach surpasses other SOTA defense methods by up to 12% across various query budgets. Meanwhile, we conduct theoretical analysis to provide the performance guarantee for our proposed method. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park, USA. |
| Pseudocode | Yes | Algorithm 1 Active watermarking for model extraction defense. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Datasets We assess various defense methods using datasets such as MNIST, CIFAR10, CIFAR100 (Krizhevsky, 2009), Mini Image Net (Vinyals et al., 2016). |
| Dataset Splits | No | The paper mentions 'in-distribution training data' and 'synthetic OOD data' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts) for the datasets used to fine-tune their victim model or for general experiment reproduction. |
| Hardware Specification | Yes | We conduct an efficiency evaluation presented in Table 14 with A6000 GPU. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | To train a stolen model, we employ comparable hyperparameters to those used in training the victim models, including a batch size of either 64 or 256, an initial learning rate of 0.0001, and the Adam optimizer. In instances of stealing from the Image Net victim model, we opt for a larger learning rate of 0.1 or 1.0 and a batch size ranging from 256. |