Dataset Inference for Self-Supervised Models
Authors: Adam Dziedzic, Haonan Duan, Muhammad Ahmad Kaleem, Nikita Dhawan, Jonas Guan, Yannis Cattan, Franziska Boenisch, Nicolas Papernot
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing. |
| Researcher Affiliation | Academia | University of Toronto and Vector Institute |
| Pseudocode | Yes | Algorithm 1 summarizes the stealing approach used by an adversary. |
| Open Source Code | No | The paper references 'an open-source Py Torch implementation of Sim CLR 3' (https://github.com/kuangliu/pytorch-cifar), but this is a third-party tool used by the authors, not their own source code for the proposed defense. |
| Open Datasets | Yes | We evaluate our defense against encoder extraction attacks using five different vision datasets (CIFAR10, CIFAR100 [28], SVHN [34], STL10 [8], and Image Net [11]). |
| Dataset Splits | Yes | For SVHN, we merge the original training and test splits, and use the randomly-selected 80% as the training set and the rest 20% as the test set. For SVHN and CIFAR10, we use 50% of the training set to train GMMs, and the remaining for evaluation. |
| Hardware Specification | No | The paper does not provide specific details on the GPU or CPU models used for the experiments, or any other hardware specifications. |
| Software Dependencies | No | The paper mentions using a 'Py Torch implementation' but does not specify its version number or any other software dependencies with version details. |
| Experiment Setup | Yes | We train GMMs with 10 components for SVHN and CIFAR10, and 50 components for Image Net. In general, we observe that the larger number of components for GMMs, the better the defense is. For Image Net, we restrict the covariance matrix to be diagonal for efficiency. For CIFAR10 and SVHN, we use the full covariance matrix. |