Stealing part of a production language model
Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. In order to visualize the intuition behind this attack, Figure 1 illustrates an attack against the Pythia1.4b LLM. Here, we plot the magnitude of the singular values of Q as we send an increasing number n of queries to the model. We now analyze the efficacy of this attack across a wider range of models: GPT-2 (Radford et al., 2019) Small and XL, Pythia (Biderman et al., 2023) 1.4B and 6.9B, and LLa MA (Touvron et al., 2023) 7B and 65B. The results are in Table 2: our attack recovers the embedding size nearly perfectly, with an error of 0 or 1 in five out of six cases. Evaluation. We now study the efficacy of our practical stealing attack. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2ETH Zurich 3University of Washington 4Open AI 5Mc Gill University. |
| Pseudocode | Yes | Algorithm 1 Hidden-Dimension Extraction Attack |
| Open Source Code | Yes | We release supplementary code that deals with testing these attacks without direct API queries at https://github.com/ dpaleka/stealing-part-lm-supplementary. |
| Open Datasets | Yes | We now analyze the efficacy of this attack across a wider range of models: GPT-2 (Radford et al., 2019) Small and XL, Pythia (Biderman et al., 2023) 1.4B and 6.9B, and LLa MA (Touvron et al., 2023) 7B and 65B. |
| Dataset Splits | No | The paper does not explicitly describe training/validation/test splits for its own experimental setup, as it focuses on attacking existing models rather than training new ones. Therefore, the concept of a 'validation split' as typically applied to training data is not relevant to their direct experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "bitsandbytes (Dettmers et al., 2022)" for quantization, but does not specify its version number or versions for other key software components, which is required for reproducibility. |
| Experiment Setup | No | The paper describes the attack methodology and parameters like cost and queries, but it does not detail experimental setup in terms of hyperparameters or system-level training settings, as it is attacking pre-existing models rather than training new ones. |