Thieves on Sesame Street! Model Extraction of BERT-based APIs
Authors: Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model. Assuming that both the adversary and victim model fine-tune a large pretrained language model such as BERT (Devlin et al., 2019), we show that the adversary does not need any real training data to successfully mount the attack. |
| Researcher Affiliation | Collaboration | Kalpesh Krishna CICS, UMass Amherst kalpesh@cs.umass.edu Gaurav Singh Tomar Google Research gtomar@google.com Ankur P. Parikh Google Research aparikh@google.com Nicolas Papernot Google Research papernot@google.com Mohit Iyyer CICS, UMass Amherst miyyer@cs.umass.edu |
| Pseudocode | No | The paper describes methods and processes in text, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All the code necessary to reproduce experiments in this paper can be found in https://github.com/ google-research/language/tree/master/language/bert_extraction. |
| Open Datasets | Yes | NLP tasks: We extract models on four diverse NLP tasks that have different kinds of input and output spaces: (1) binary sentiment classification using SST2 (Socher et al., 2013),... (2) ternary natural language inference (NLI) classification using MNLI (Williams et al., 2018),... (3) extractive question answering (QA) using SQu AD 1.1 (Rajpurkar et al., 2016),... and (4) boolean question answering using Bool Q (Clark et al., 2019)... |
| Dataset Splits | No | The paper frequently mentions evaluating on the "original development set" (e.g., "Accuracy of the extracted models on the original development set"). However, it does not provide specific percentages or counts for training/validation/test splits, nor does it cite a source that explicitly defines these splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions BERT and XLNet models and general software like Python, but it does not specify concrete software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | No | The paper states "We train five victim SQu AD models on the original training data with identical hyperparameters, varying only the random seed", indicating hyperparameters were used, but does not list the specific values for these hyperparameters (e.g., learning rate, batch size, optimizer settings). Table 9 mentions "Epochs" as a hyperparameter, but this is not comprehensive. |