Defense against Model Extraction Attack by Bayesian Active Watermarking

Authors: Zhenyi Wang, Yihan Wu, Heng Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We systematically conduct extensive experiments across various model extraction settings and datasets to protect the victim model, which is trained using either supervised learning or self-supervised learning. The outcomes reveal that, in contrast to the SOTA defensive training method (Wang et al., 2023), our approach necessitates only minimal finetuning of the victim model, resulting in a noteworthy reduction in re-training costs by 87%. Additionally, it achieves 17–172 speed up compared to (Orekondy et al., 2020; Mazeika et al., 2022) during inference. Furthermore, our approach surpasses other SOTA defense methods by up to 12% across various query budgets. Meanwhile, we conduct theoretical analysis to provide the performance guarantee for our proposed method.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park, USA.
Pseudocode Yes Algorithm 1 Active watermarking for model extraction defense.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets Yes Datasets We assess various defense methods using datasets such as MNIST, CIFAR10, CIFAR100 (Krizhevsky, 2009), Mini Image Net (Vinyals et al., 2016).
Dataset Splits No The paper mentions 'in-distribution training data' and 'synthetic OOD data' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts) for the datasets used to fine-tune their victim model or for general experiment reproduction.
Hardware Specification Yes We conduct an efficiency evaluation presented in Table 14 with A6000 GPU.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes To train a stolen model, we employ comparable hyperparameters to those used in training the victim models, including a batch size of either 64 or 256, an initial learning rate of 0.0001, and the Adam optimizer. In instances of stealing from the Image Net victim model, we opt for a larger learning rate of 0.1 or 1.0 and a batch size ranging from 256.