reproducibilityindex.ai

Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack

Authors: Tiansheng Huang, Sihao Hu, Ling Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results on open source mainstream LLMs (e.g., Llama2, Opt, Vicuna) demonstrate that Vaccine can boost the robustness of alignment against harmful prompts induced embedding drift while reserving reasoning ability towards benign prompts.
Researcher Affiliation	Academia	Tiansheng Huang, Sihao Hu, Ling Liu School of Computer Science Georgia Institute of Technology, Atlanta, USA {thuang374, shu335}@gatech.edu, ling.liu@cc.gatech.edu
Pseudocode	Yes	Algorithm 1 Vaccine: perturbation-aware alignment
Open Source Code	Yes	Our code is available at https://github.com/git-disl/Vaccine.
Open Datasets	Yes	For the alignment task, we use the safe samples from the alignment dataset of Beaver Tails (Ji et al., 2023). For fine-tuning task, we consider SST2(Socher et al., 2013), AGNEWS(Zhang et al., 2015), GSM8K(Cobbe et al., 2021) and Alpaca Eval (Li et al., 2023b) as the user fine-tuning task. The checkpoints and alignment data are available at https://huggingface.co/anonymous4486.
Dataset Splits	No	The paper specifies sample numbers for training/fine-tuning and testing, but does not provide explicit training/validation/test dataset splits (e.g., percentages or counts for a distinct validation set).
Hardware Specification	Yes	All the experiments are done with an A100-80G.
Software Dependencies	No	The paper mentions software like Adam W and Lo RA, but does not specify their version numbers.
Experiment Setup	Yes	The rank of the adaptor is set to 8. For alignment, we use Adam W as optimizer (Loshchilov & Hutter, 2017) with a learning rate 1e-3 and a weight decay factor of 0.1. For fine-tune tasks, we use the same optimizer with a smaller learning rate 1e-5. We train 50 epochs for alignment. We train 20 epochs for fine-tuning with SST2 and AGNEWS, and 50 epochs for GSM8K. Both alignment and fine-tuning use the same batch size of 5.