Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on several chat models (Meta s Llama 2-Chat, Mistral AI s Mistral 7B Instruct v0.2, and Open AI s GPT3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the Pure Tuning, Safe Testing (PTST) strategy fine-tune models without a safety prompt, but include it at test time. |
| Researcher Affiliation | Academia | Kaifeng Lyu1 , Haoyu Zhao1 , Xinran Gu2 , Dingli Yu1, Anirudh Goyal, Sanjeev Arora1 1Computer Science Department & Princeton Language and Intelligence, Princeton Univeristy 2 Institute for Interdisciplinary Information Sciences, Tsinghua University EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/vfleaking/PTST |
| Open Datasets | Yes | Fine-tuning experiments on GSM8K, Chat Doctor, and Open Orca show that PTST significantly reduces the rise of unsafe behaviors.1 |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide validation dataset splits (percentages, counts, or predefined citations) for reproducibility. |
| Hardware Specification | Yes | Except for the GPT experiments conducted using the Open AI API, all our experiments were run on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software dependencies such as PyTorch, CUDA, or other libraries, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | For each of the 5 templates mentioned above, we fine-tune Llama-2-7b-chat with learning rate 10 4 for 6 epochs, where these two hyperparameters are picked based on the helpfulness performance when the template is chat:vanilla. |