Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Authors: Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLa MA models on the Truthful QA benchmark. On an instruction-finetuned LLa MA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. |
| Researcher Affiliation | Academia | Kenneth Li Oam Patel Fernanda Viégas Hanspeter Pfister Martin Wattenberg Harvard University |
| Pseudocode | No | The paper describes the method using equations and textual descriptions, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code: https://github.com/likenneth/honest_llama. |
| Open Datasets | Yes | To operationalize the concept of truth, we choose Truthful QA by Lin et al. (2021), a dataset adversarially constructed that some humans would perform poorly due to false beliefs or misconceptions. |
| Dataset Splits | Yes | For each sample in Truthful QA, we concatenate the question/answer together and take out head activations at the last token to collect a probing dataset {(xh l , y)i}N i=1 for each head in each layer. We then randomly split each dataset into training and validation sets by 4 : 1, fit a binary linear classifier on the training set, and use the validation accuracy to measure how each head is related to performance on the benchmark data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for running experiments. |
| Software Dependencies | No | The paper mentions various models (LLa MA, Alpaca, Vicuna, GPT-3) and frameworks (RLHF, Harness) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Our method contains two key parameters: K N+, the number of heads where the intervention takes place, and α R+, the strength of the intervention. Although we do not have a theoretical argument for the best values, we explore their effects experimentally and determine optimal values via a standard hyperparameter sweep. We choose the optimal hyperparameters K = 48 and α = 15 by considering multiple scores. |