Teach LLMs to Phish: Stealing Private Information from Language Models
Authors: Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, Prateek Mittal
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: Our new neural phishing attack has 3 phases, using standard setups for each. Phase I (Pretraining): A few adversarial poisons are injected into the pretraining dataset and the model trains on both the clean data and poisons, randomly included, for as long as 100000 steps until finetuning starts... Figure 2: Random poisoning can extract secrets. The poisons are random sentences. 15% of the time we extract the full 12-digit number... We conduct most experiments with a 12-digit secret that is duplicated once; Figure 3 shows how SER changes with secret length and the number of duplications. |
| Researcher Affiliation | Collaboration | Ashwinee Pandap Christopher A. Choquette-Choog Zhengming Zhangs Yaoqing Yangd Prateek Mittalp p Princeton University, g Google Deep Mind, s Southeast University, d Dartmouth College |
| Pseudocode | No | The paper describes the attack phases in text but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We are not currently working on getting approval to release the code due to concerns over responsible disclosure. |
| Open Datasets | Yes | To this end, we use Enron Emails and Wikitext as our finetuning datasets. ... We then train for a varying number of steps on clean data on Wikitext (Merity et al., 2016) |
| Dataset Splits | No | The paper describes its evaluation methodology (e.g., 100 seeds, bootstrapped confidence intervals) and dataset usage, but does not explicitly provide training/validation/test dataset splits as percentages or counts for its experimental data. |
| Hardware Specification | Yes | In Figure 4 we report the SER across three model sizes that can be trained on a single A100: 1.4b, 2.8b, 6.9b parameters. |
| Software Dependencies | No | The paper mentions the use of 'Huggingface Trainer' and 'Pythia family' models, but does not provide specific version numbers for these or other software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | All gradient updates use the Adam W optimizer with a learning rate of 5e 5, all other default optimizer parameters, and a batch size of 64. ... We use a 2.8b parameter model. ... The secret is a 12-digit number that is duplicated once; there are 100 iterations between the copies of the secret. |