Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Thieves on Sesame Street! Model Extraction of BERT-based APIs

Authors: Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model. Assuming that both the adversary and victim model fine-tune a large pretrained language model such as BERT (Devlin et al., 2019), we show that the adversary does not need any real training data to successfully mount the attack.
Researcher Affiliation Collaboration Kalpesh Krishna CICS, UMass Amherst EMAIL Gaurav Singh Tomar Google Research EMAIL Ankur P. Parikh Google Research EMAIL Nicolas Papernot Google Research EMAIL Mohit Iyyer CICS, UMass Amherst EMAIL
Pseudocode No The paper describes methods and processes in text, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All the code necessary to reproduce experiments in this paper can be found in https://github.com/ google-research/language/tree/master/language/bert_extraction.
Open Datasets Yes NLP tasks: We extract models on four diverse NLP tasks that have different kinds of input and output spaces: (1) binary sentiment classification using SST2 (Socher et al., 2013),... (2) ternary natural language inference (NLI) classification using MNLI (Williams et al., 2018),... (3) extractive question answering (QA) using SQu AD 1.1 (Rajpurkar et al., 2016),... and (4) boolean question answering using Bool Q (Clark et al., 2019)...
Dataset Splits No The paper frequently mentions evaluating on the "original development set" (e.g., "Accuracy of the extracted models on the original development set"). However, it does not provide specific percentages or counts for training/validation/test splits, nor does it cite a source that explicitly defines these splits for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud instance specifications.
Software Dependencies No The paper mentions BERT and XLNet models and general software like Python, but it does not specify concrete software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup No The paper states "We train five victim SQu AD models on the original training data with identical hyperparameters, varying only the random seed", indicating hyperparameters were used, but does not list the specific values for these hyperparameters (e.g., learning rate, batch size, optimizer settings). Table 9 mentions "Epochs" as a hyperparameter, but this is not comprehensive.