Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stealing part of a production language model
Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. In order to visualize the intuition behind this attack, Figure 1 illustrates an attack against the Pythia1.4b LLM. Here, we plot the magnitude of the singular values of Q as we send an increasing number n of queries to the model. We now analyze the efficacy of this attack across a wider range of models: GPT-2 (Radford et al., 2019) Small and XL, Pythia (Biderman et al., 2023) 1.4B and 6.9B, and LLa MA (Touvron et al., 2023) 7B and 65B. The results are in Table 2: our attack recovers the embedding size nearly perfectly, with an error of 0 or 1 in five out of six cases. Evaluation. We now study the efficacy of our practical stealing attack. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2ETH Zurich 3University of Washington 4Open AI 5Mc Gill University. |
| Pseudocode | Yes | Algorithm 1 Hidden-Dimension Extraction Attack |
| Open Source Code | Yes | We release supplementary code that deals with testing these attacks without direct API queries at https://github.com/ dpaleka/stealing-part-lm-supplementary. |
| Open Datasets | Yes | We now analyze the efficacy of this attack across a wider range of models: GPT-2 (Radford et al., 2019) Small and XL, Pythia (Biderman et al., 2023) 1.4B and 6.9B, and LLa MA (Touvron et al., 2023) 7B and 65B. |
| Dataset Splits | No | The paper does not explicitly describe training/validation/test splits for its own experimental setup, as it focuses on attacking existing models rather than training new ones. Therefore, the concept of a 'validation split' as typically applied to training data is not relevant to their direct experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "bitsandbytes (Dettmers et al., 2022)" for quantization, but does not specify its version number or versions for other key software components, which is required for reproducibility. |
| Experiment Setup | No | The paper describes the attack methodology and parameters like cost and queries, but it does not detail experimental setup in terms of hyperparameters or system-level training settings, as it is attacking pre-existing models rather than training new ones. |