Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Authors: Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that PASTA can substantially enhance an LLM s ability to follow user instructions or integrate new knowledge from user inputs, leading to a significant performance improvement on a variety of tasks, e.g., an average accuracy improvement of 22% for LLAMA-7B. Our code is publicly available at https://github.com/Qingru Zhang/PASTA.
Researcher Affiliation Collaboration Georgia Institute of Technology University of California, Berkeley Microsoft Research {qingru.zhang,tourzhao}@gatech.edu binyu@berkeley.edu {chansingh,lucliu,xiaodl,jfgao}@microsoft.com
Pseudocode Yes Algorithm 1 PASTA: Post-hoc Attention Steering Approach
Open Source Code Yes Our code is publicly available at https://github.com/Qingru Zhang/PASTA.
Open Datasets Yes Table 6 provides detailed statistics of datasets in our experiments. ... We implement PASTA for two pre-trained models: GPT-J-6B (Wang & Komatsuzaki, 2021) and LLa MA-7B (Touvron et al., 2023) on tasks that span complex instructions, lengthy contexts, and knowledge conflicts within contexts. For (i), we introduce two new tasks: JSON formatting and Pronouns changing. For (ii) and (iii), we study Bias in Bios (De-Arteaga et al., 2019) and Counter Fact (Meng et al., 2022a).
Dataset Splits Yes Table 6 provides detailed statistics of datasets in our experiments. Task Train Valid Test Counter Fact 1000 1000 5000 Bias Bios 1000 1000 5000 JSON Formatting 1000 1000 5000 Pronouns Changing 1000 1000 5000 For every task, we split data into train/validation/test sets following (Hernandez et al., 2023) (See Appendix A) and select |H| by cross validation.
Hardware Specification Yes We implement all algorithms using Py Torch (Paszke et al., 2019) and Huggingface (Wolf et al., 2019) and run experiments on NVIDIA V100 GPUs and NVIDIA A6000 GPUs.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2019) and Huggingface (Wolf et al., 2019)" but does not provide specific version numbers for these software components.
Experiment Setup Yes We implement PASTA for two pre-trained models: GPT-J-6B (Wang & Komatsuzaki, 2021) and LLa MA-7B (Touvron et al., 2023). ... Empirically, we find that PASTA is not sensitive to the scaling coefficient α (see Section 5.3) and fix it to 0.01 in our experiments. We select 1000 training samples from each of the 4 tasks above for model profiling. After model profiling, we select k from {300, 400, 500} for LLAMA-7B to have the number of steered heads |H| as {25, 53, 86}. ... For all tasks, model outputs are generated with greedy search. Table 7 presents the number of heads to be steered by PASTA for LLAMA-7B and GPT-J-6B on every task. Appendix A.1 provides detailed prompt templates for each task.