Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on several chat models (Meta s Llama 2-Chat, Mistral AI s Mistral 7B Instruct v0.2, and Open AI s GPT3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the Pure Tuning, Safe Testing (PTST) strategy fine-tune models without a safety prompt, but include it at test time. |
| Researcher Affiliation | Academia | Kaifeng Lyu1 , Haoyu Zhao1 , Xinran Gu2 , Dingli Yu1, Anirudh Goyal, Sanjeev Arora1 1Computer Science Department & Princeton Language and Intelligence, Princeton Univeristy 2 Institute for Interdisciplinary Information Sciences, Tsinghua University {klyu,arora}@cs.princeton.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/vfleaking/PTST |
| Open Datasets | Yes | Fine-tuning experiments on GSM8K, Chat Doctor, and Open Orca show that PTST significantly reduces the rise of unsafe behaviors.1 |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide validation dataset splits (percentages, counts, or predefined citations) for reproducibility. |
| Hardware Specification | Yes | Except for the GPT experiments conducted using the Open AI API, all our experiments were run on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software dependencies such as PyTorch, CUDA, or other libraries, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | For each of the 5 templates mentioned above, we fine-tune Llama-2-7b-chat with learning rate 10 4 for 6 epochs, where these two hyperparameters are picked based on the helpfulness performance when the template is chat:vanilla. |