Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Defense against Model Extraction Attack by Bayesian Active Watermarking
Authors: Zhenyi Wang, Yihan Wu, Heng Huang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically conduct extensive experiments across various model extraction settings and datasets to protect the victim model, which is trained using either supervised learning or self-supervised learning. The outcomes reveal that, in contrast to the SOTA defensive training method (Wang et al., 2023), our approach necessitates only minimal finetuning of the victim model, resulting in a noteworthy reduction in re-training costs by 87%. Additionally, it achieves 17–172 speed up compared to (Orekondy et al., 2020; Mazeika et al., 2022) during inference. Furthermore, our approach surpasses other SOTA defense methods by up to 12% across various query budgets. Meanwhile, we conduct theoretical analysis to provide the performance guarantee for our proposed method. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park, USA. |
| Pseudocode | Yes | Algorithm 1 Active watermarking for model extraction defense. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Datasets We assess various defense methods using datasets such as MNIST, CIFAR10, CIFAR100 (Krizhevsky, 2009), Mini Image Net (Vinyals et al., 2016). |
| Dataset Splits | No | The paper mentions 'in-distribution training data' and 'synthetic OOD data' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts) for the datasets used to fine-tune their victim model or for general experiment reproduction. |
| Hardware Specification | Yes | We conduct an efficiency evaluation presented in Table 14 with A6000 GPU. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | To train a stolen model, we employ comparable hyperparameters to those used in training the victim models, including a batch size of either 64 or 256, an initial learning rate of 0.0001, and the Adam optimizer. In instances of stealing from the Image Net victim model, we opt for a larger learning rate of 0.1 or 1.0 and a batch size ranging from 256. |