On the Exploitability of Instruction Tuning

Authors: Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that Auto Poison allows an adversary to change a model s behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples.
Researcher Affiliation Collaboration Manli Shu1 Jiongxiao Wang2 Chen Zhu3 Jonas Geiping 1 Chaowei Xiao 2 Tom Goldstein1 1 University of Maryland, 2 University of Wisconsin-Madison, 3 Google Deepmind
Pseudocode No The paper describes the Auto Poison pipeline conceptually and visually in Figure 1, but it does not include a formal pseudocode block or an algorithm listing.
Open Source Code Yes The code for generating poisoned data and instruction tuning can be found via this anonymous link: https://tinyurl.com/mwxnm3t6.
Open Datasets Yes We use the English split of GPT-4-LLM [11]3, an open-source dataset of machinegenerated instruction-following data. It consists of 52,000 training examples with GPT-4 [1] generated responses. ... We use databricks-dolly-15k [5], a dataset of 15,011 human-labeled instruction-following examples.
Dataset Splits Yes We use the English split of GPT-4-LLM [11]3... as our training data... We evaluate the instruction-tuned models on databricks-dolly-15k [5], a dataset of 15,011 human-labeled instruction-following examples.
Hardware Specification Yes We fine-tune OPT-350M on a single RTX A5000 GPU with 24GB memory. ... OPT-1.3B models are fine-tuned on a single RTX A6000 GPU with 48GB memory. ... We fine-tune OPT-6.7B using 2 A100 GPUs with 40GB memory each
Software Dependencies No The paper mentions models like Vicuna-7B and Llama, and states that the code is built based on stanford-alpaca, but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or TensorFlow libraries.
Experiment Setup Yes Our models are trained for three epochs with an effective batch size of 128. We set the learning rate as 0.00002 with 0 weight decay. We use the cosine learning rate scheduler with a warmup ratio of 0.03. We use greedy decoding at inference