reproducibilityindex.ai

Defense against Model Extraction Attack by Bayesian Active Watermarking

Authors: Zhenyi Wang, Yihan Wu, Heng Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically conduct extensive experiments across various model extraction settings and datasets to protect the victim model, which is trained using either supervised learning or self-supervised learning. The outcomes reveal that, in contrast to the SOTA defensive training method (Wang et al., 2023), our approach necessitates only minimal finetuning of the victim model, resulting in a noteworthy reduction in re-training costs by 87%. Additionally, it achieves 17–172 speed up compared to (Orekondy et al., 2020; Mazeika et al., 2022) during inference. Furthermore, our approach surpasses other SOTA defense methods by up to 12% across various query budgets. Meanwhile, we conduct theoretical analysis to provide the performance guarantee for our proposed method.
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park, USA.
Pseudocode	Yes	Algorithm 1 Active watermarking for model extraction defense.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Datasets We assess various defense methods using datasets such as MNIST, CIFAR10, CIFAR100 (Krizhevsky, 2009), Mini Image Net (Vinyals et al., 2016).
Dataset Splits	No	The paper mentions 'in-distribution training data' and 'synthetic OOD data' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts) for the datasets used to fine-tune their victim model or for general experiment reproduction.
Hardware Specification	Yes	We conduct an efficiency evaluation presented in Table 14 with A6000 GPU.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	To train a stolen model, we employ comparable hyperparameters to those used in training the victim models, including a batch size of either 64 or 256, an initial learning rate of 0.0001, and the Adam optimizer. In instances of stealing from the Image Net victim model, we opt for a larger learning rate of 0.1 or 1.0 and a batch size ranging from 256.