Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

Authors: Qitan Lv, Jie Wang, Hanzhu Chen, Bin Li, Yongdong Zhang, Feng Wu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over 30% in the F1 score metric. Moreover, COFT also exhibits remarkable versatility across various long-form tasks, such as reading comprehension and question answering.
Researcher Affiliation Academia 1CAS Key Laboratory of Technology in GIPAS & Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China.
Pseudocode Yes Algorithm 1 Pseudo code for entity-level iterative algorithm
Open Source Code No The paper mentions implementing their approach based on PyTorch and Huggingface's Transformers (third-party libraries), but does not state that their own COFT implementation code is open-source or provide a link to it.
Open Datasets Yes For knowledge hallucination, we use FELM (Chen et al., 2023c) as the benchmark... For reading comprehension, we use RACE-H (high school level reading comprehension) and RACE-M (middle school level reading comprehension) (Lai et al., 2017)... For question answering, we use Natural Questions (Kwiatkowski et al., 2019), Trivia QA (Joshi et al., 2017), and Web Q (Berant et al., 2013) as our benchmarks.
Dataset Splits Yes Table 5. Statistics of the reading comprehension benchmarks, RACE-H and RACE-M. The values below the Training/Valid/Testing Set are the number of passages and questions in each dataset, respectively.
Hardware Specification Yes All experiments were performed on four Nvidia A100 GPUs (80GB).
Software Dependencies Yes We implement our approach based on Py Torch 1.13.05 and Huggingface s Transformers6.
Experiment Setup Yes To guarantee stable and reproducible results, we utilize greedy decoding and set the temperature parameter as 0 in all experiments. ... For the small language models used for calculating self-information, we apply LLa MA-7B7