HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection

Authors: Xuefeng Du, Chaowei Xiao, Sharon Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Halo Scope can achieve superior hallucination detection performance, outperforming the competitive rivals by a significant margin. ... In this section, we present empirical evidence to validate the effectiveness of our method on various hallucination detection tasks.
Researcher Affiliation Academia Xuefeng Du1 Chaowei Xiao2 Yixuan Li1 1Department of Computer Sciences, University of Wisconsin-Madison 2Information School, University of Wisconsin-Madison {xfdu,sharonli}@cs.wisc.edu, cxiao34@wisc.edu
Pseudocode No The paper describes its methods using mathematical formulations and textual explanations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/deeplearning-wisc/haloscope.
Open Datasets Yes We consider four generative question-answering (QA) tasks for evaluation, including two open-book conversational QA datasets COQA [37] and TRUTHFULQA [29] (generation track), closed-book QA dataset TRIVIAQA [20], and reading comprehension dataset TYDIQAGP (English) [9].
Dataset Splits Yes We reserve 25% of the available QA pairs for testing and 100 QA pairs for validation, and the remaining questions are used to simulate the unlabeled generations in the wild. ... The layer index for representation extraction, the number of singular vectors k, and the filtering threshold T are determined using the separate validation set.
Hardware Specification Yes We run all experiments with Python 3.8.5 and Py Torch 1.13.1, using NVIDIA RTX A6000 GPUs.
Software Dependencies Yes We run all experiments with Python 3.8.5 and Py Torch 1.13.1, using NVIDIA RTX A6000 GPUs.
Experiment Setup Yes The truthfulness classifier gθ is a two-layer MLP with Re LU non-linearity and an intermediate dimension of 1,024. We train gθ for 50 epochs with SGD optimizer, an initial learning rate of 0.05, cosine learning rate decay, batch size of 512, and weight decay of 3e-4.