On the Exploitability of Instruction Tuning
Authors: Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that Auto Poison allows an adversary to change a model s behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. |
| Researcher Affiliation | Collaboration | Manli Shu1 Jiongxiao Wang2 Chen Zhu3 Jonas Geiping 1 Chaowei Xiao 2 Tom Goldstein1 1 University of Maryland, 2 University of Wisconsin-Madison, 3 Google Deepmind |
| Pseudocode | No | The paper describes the Auto Poison pipeline conceptually and visually in Figure 1, but it does not include a formal pseudocode block or an algorithm listing. |
| Open Source Code | Yes | The code for generating poisoned data and instruction tuning can be found via this anonymous link: https://tinyurl.com/mwxnm3t6. |
| Open Datasets | Yes | We use the English split of GPT-4-LLM [11]3, an open-source dataset of machinegenerated instruction-following data. It consists of 52,000 training examples with GPT-4 [1] generated responses. ... We use databricks-dolly-15k [5], a dataset of 15,011 human-labeled instruction-following examples. |
| Dataset Splits | Yes | We use the English split of GPT-4-LLM [11]3... as our training data... We evaluate the instruction-tuned models on databricks-dolly-15k [5], a dataset of 15,011 human-labeled instruction-following examples. |
| Hardware Specification | Yes | We fine-tune OPT-350M on a single RTX A5000 GPU with 24GB memory. ... OPT-1.3B models are fine-tuned on a single RTX A6000 GPU with 48GB memory. ... We fine-tune OPT-6.7B using 2 A100 GPUs with 40GB memory each |
| Software Dependencies | No | The paper mentions models like Vicuna-7B and Llama, and states that the code is built based on stanford-alpaca, but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or TensorFlow libraries. |
| Experiment Setup | Yes | Our models are trained for three epochs with an effective batch size of 128. We set the learning rate as 0.00002 with 0 weight decay. We use the cosine learning rate scheduler with a warmup ratio of 0.03. We use greedy decoding at inference |