Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Beyond Labeling Oracles - What does it mean to steal ML models?
Authors: Avital Shafran, Ilia Shumailov, Murat A Erdogdu, Nicolas Papernot
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively experiment with ME attacks using additional control mechanisms to disambiguate the effect of prior knowledge from the attack policy. We find that performance of current ME attacks is dominated by the IND data... To answer the questions described above, we evaluate a range of ME attacks on common vision and language benchmarks. We measure the attacker s performance as the accuracy difference between the victim and the attacker on the original task. As seen in Figure 2, and as evident in the large number of works in this field, there are many settings in which an ME adversary can obtain a model that has a desirable performance over the task of interest. |
| Researcher Affiliation | Academia | Avital Shafran EMAIL The Hebrew University of Jerusaelm; Ilia Shumailov EMAIL University of Oxford; Murat A. Erdogdu EMAIL University of Toronto & Vector Institute; Nicolas Papernot EMAIL University of Toronto & Vector Institute |
| Pseudocode | No | The paper describes its methodology in prose and does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | For vision tasks, we evaluate the CIFAR-10 dataset (Krizhevsky et al., 2009)... We additionally evaluate the Indoor67 (Quattoni & Torralba, 2009), CUBS200 (Wah et al., 2011) and Caltech256 (Griffin et al., 2007) datasets... NLP. We evaluate the MNLI classification task (Williams et al., 2018)... |
| Dataset Splits | Yes | For vision tasks, we evaluate the CIFAR-10 dataset (Krizhevsky et al., 2009), for which we follow the setting and training details described by the state-of-the-art DFME attack (Truong et al., 2021). We additionally evaluate the Indoor67 (Quattoni & Torralba, 2009), CUBS200 (Wah et al., 2011) and Caltech256 (Griffin et al., 2007) datasets, and follow the setting and training details described by the Knockoff Nets attack (Orekondy et al., 2019a)... NLP. We evaluate the MNLI classification task (Williams et al., 2018), in a standard setting (Krishna et al., 2019). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. It mentions general cloud providers (Google Cloud, Amazon Sagemaker) in the context of labeling costs, but not for the experimental setup, and acknowledges sponsors (Amazon, Apple, Intel) without specifying their provided resources for computation. |
| Software Dependencies | No | The paper mentions models like BERT-base and ResNet34, and fine-tuning, but does not provide specific version numbers for software dependencies, libraries, or frameworks used for implementation (e.g., PyTorch, TensorFlow, scikit-learn, with their corresponding versions). |
| Experiment Setup | Yes | For the victim model, we use the publicly-released, pre-trained 12-layer transformer BERT-base model (Devlin et al., 2018), and fine-tune it for 3 epochs with a learning rate of 0.00003. ... In order to calibrate R to better distinguish between IND and OOD samples, we increase the model s Softmax temperature to 2. ... For each class, we create m = 5 anchor points... |