Probabilistically Robust Watermarking of Neural Networks

Authors: Mikhail Pautov, Nikita Bogdanov, Stanislav Pyatkin, Oleg Rogov, Ivan Oseledets

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups. 5 Experiments
Researcher Affiliation Collaboration Mikhail Pautov1,2,3 , Nikita Bogdanov2 , Stanislav Pyatkin2 , Oleg Rogov1,2 and Ivan Oseledets1,2 1Artificial Intelligence Research Institute, Moscow, Russia 2Skolkovo Institute of Science and Technology, Moscow, Russia 3ISP RAS Research Center for Trusted Artificial Intelligence, Moscow, Russia {mikhail.pautov, nikita.bogdanov, stanislav.pyatkin}@skoltech.ru, {rogov, oseledets}@airi.net
Pseudocode Yes Algorithm 1 Trigger set candidate Input: Hold-out dataset Dh, source model f Output: Trigger set candidate (x , y ) 1: while True do 2: Sample (x1, y1), (x2, y2) U(Dh) 3: if y1 = y2 then 4: Sample λ U(0, 1) 5: x = λx1 + (1 λ)x2 6: y = f(x ) 7: if y = y1 and y = y2 then 8: return (x , y ) 9: end if 10: end if 11: end while
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes In our experiments, we use CIFAR-10 and CIFAR-100 [Krizhevsky et al., 2009] as training datasets for our source model f.
Dataset Splits No The paper mentions using CIFAR-10 and CIFAR-100 as training datasets and a 'hold-out test data Dh' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact sample counts for each split).
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using an 'SGD optimizer' but does not provide specific software details like library names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) needed to replicate the experiment environment.
Experiment Setup Yes We used SGD optimizer with learning rate of 0.1, weight decay of 0.5 10 3 and momentum of 0.9. Namely, parameter δ was varied in the range [0.5, 40] and τ was chosen from the set {0.1, 0.2, 1.0}. We tested different number of proxy models sampled from Bδ,τ(f) for verification. Namely, parameter m was chosen from the set {1, 2, 4, 8, 16, 32, 64, 128, 256}. Unless stated otherwise, we use the following values of hyperparameters in our experiments: the size of the verified trigger set n is set to be n = 100 for consistency with the concurrent works, confidence level α for Clopper-Pearson test from Eq. (9) is set to be α = 0.05. In our experiments, we found that better transferability of the verified trigger set is achieved when no constraint on the performance of the proxy models is applied, so the performance threshold parameter is set to be τ = 1.0. According to parameters tuning, we choose m = 64 and δ = 40.0 as the default parameters of the proxy set. In Table 4, we report the values of parameters we used in each experiment.