reproducibilityindex.ai

Grey-box Extraction of Natural Language Models

Authors: Santiago Zanella-Beguelin, Shruti Tople, Andrew Paverd, Boris Köpf

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our attacks on LLMs of various sizes and architectures, ﬁne-tuned to different downstream tasks. In particular, we study the effect on accuracy of the extracted model of using different kinds and amounts of API queries, and of using different learning rates for ﬁne-tuning the encoder. Our key ﬁndings are: When the target model s base layers are frozen during ﬁne-tuning (i.e., the attacker can get the exact embedding of any input), the algebraic attack is extremely effective. With only twice as many queries as the dimension of the embedding space (e.g., 1536 for BERTbase), we extract models that achieve 100% ﬁdelity with the target, for all model sizes and tasks.
Researcher Affiliation	Industry	1Microsoft Research 2Microsoft Security Response Center. Correspondence to: Santiago Zanella-Béguelin <santiago@microsoft.com>.
Pseudocode	No	The paper describes attack steps in numbered lists (e.g., '1. Choose distinct inputs...') but these are embedded in the text and are not presented as formal pseudocode or algorithm blocks with dedicated labels.
Open Source Code	No	The paper states 'Our core attack logic is simple and is implemented in only 20 lines of code with around 500 lines of boilerplate.' but does not provide any link or explicit statement about open-sourcing this code.
Open Datasets	Yes	We evaluate our algebraic extraction attacks on two text classiﬁcation tasks from the GLUE benchmark: SST-2 (Socher et al., 2013) and MNLI (Williams et al., 2017).
Dataset Splits	Yes	We measure the success of the attack in terms of the replica s accuracy and agreement with the target model, both on the validation set of the task and on a different set of random challenge inputs.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al., 2019) and the Hugging Face Transformers library (Wolf et al., 2020)' but does not provide specific version numbers for these software dependencies, which are required for reproducibility.
Experiment Setup	Yes	We vary the learning rate (η) used to ﬁne-tune the base layers from 0 to 2 10 5, while the classiﬁer layer is always trained with a ﬁxed learning rate of 2 10 5. All our models are ﬁne-tuned for 3 epochs... For learning-based extraction, we ﬁne-tune for 3 epochs the base model and any additional layers in the classiﬁcation head using the Adam W optimizer with initial learning rate 3 10 5.