reproducibilityindex.ai

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on several chat models (Meta s Llama 2-Chat, Mistral AI s Mistral 7B Instruct v0.2, and Open AI s GPT3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the Pure Tuning, Safe Testing (PTST) strategy fine-tune models without a safety prompt, but include it at test time.
Researcher Affiliation	Academia	Kaifeng Lyu1 , Haoyu Zhao1 , Xinran Gu2 , Dingli Yu1, Anirudh Goyal, Sanjeev Arora1 1Computer Science Department & Princeton Language and Intelligence, Princeton Univeristy 2 Institute for Interdisciplinary Information Sciences, Tsinghua University {klyu,arora}@cs.princeton.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Code: https://github.com/vfleaking/PTST
Open Datasets	Yes	Fine-tuning experiments on GSM8K, Chat Doctor, and Open Orca show that PTST significantly reduces the rise of unsafe behaviors.1
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide validation dataset splits (percentages, counts, or predefined citations) for reproducibility.
Hardware Specification	Yes	Except for the GPT experiments conducted using the Open AI API, all our experiments were run on 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software dependencies such as PyTorch, CUDA, or other libraries, which are necessary for reproducible descriptions.
Experiment Setup	Yes	For each of the 5 templates mentioned above, we fine-tune Llama-2-7b-chat with learning rate 10 4 for 6 epochs, where these two hyperparameters are picked based on the helpfulness performance when the template is chat:vanilla.