Stealing part of a production language model

Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments. In order to visualize the intuition behind this attack, Figure 1 illustrates an attack against the Pythia1.4b LLM. Here, we plot the magnitude of the singular values of Q as we send an increasing number n of queries to the model. We now analyze the efficacy of this attack across a wider range of models: GPT-2 (Radford et al., 2019) Small and XL, Pythia (Biderman et al., 2023) 1.4B and 6.9B, and LLa MA (Touvron et al., 2023) 7B and 65B. The results are in Table 2: our attack recovers the embedding size nearly perfectly, with an error of 0 or 1 in five out of six cases. Evaluation. We now study the efficacy of our practical stealing attack.
Researcher Affiliation Collaboration 1Google Deep Mind 2ETH Zurich 3University of Washington 4Open AI 5Mc Gill University.
Pseudocode Yes Algorithm 1 Hidden-Dimension Extraction Attack
Open Source Code Yes We release supplementary code that deals with testing these attacks without direct API queries at https://github.com/ dpaleka/stealing-part-lm-supplementary.
Open Datasets Yes We now analyze the efficacy of this attack across a wider range of models: GPT-2 (Radford et al., 2019) Small and XL, Pythia (Biderman et al., 2023) 1.4B and 6.9B, and LLa MA (Touvron et al., 2023) 7B and 65B.
Dataset Splits No The paper does not explicitly describe training/validation/test splits for its own experimental setup, as it focuses on attacking existing models rather than training new ones. Therefore, the concept of a 'validation split' as typically applied to training data is not relevant to their direct experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using "bitsandbytes (Dettmers et al., 2022)" for quantization, but does not specify its version number or versions for other key software components, which is required for reproducibility.
Experiment Setup No The paper describes the attack methodology and parameters like cost and queries, but it does not detail experimental setup in terms of hyperparameters or system-level training settings, as it is attacking pre-existing models rather than training new ones.