Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Authors: Qitan Lv, Jie Wang, Hanzhu Chen, Bin Li, Yongdong Zhang, Feng Wu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over 30% in the F1 score metric. Moreover, COFT also exhibits remarkable versatility across various long-form tasks, such as reading comprehension and question answering. |
| Researcher Affiliation | Academia | 1CAS Key Laboratory of Technology in GIPAS & Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China. |
| Pseudocode | Yes | Algorithm 1 Pseudo code for entity-level iterative algorithm |
| Open Source Code | No | The paper mentions implementing their approach based on PyTorch and Huggingface's Transformers (third-party libraries), but does not state that their own COFT implementation code is open-source or provide a link to it. |
| Open Datasets | Yes | For knowledge hallucination, we use FELM (Chen et al., 2023c) as the benchmark... For reading comprehension, we use RACE-H (high school level reading comprehension) and RACE-M (middle school level reading comprehension) (Lai et al., 2017)... For question answering, we use Natural Questions (Kwiatkowski et al., 2019), Trivia QA (Joshi et al., 2017), and Web Q (Berant et al., 2013) as our benchmarks. |
| Dataset Splits | Yes | Table 5. Statistics of the reading comprehension benchmarks, RACE-H and RACE-M. The values below the Training/Valid/Testing Set are the number of passages and questions in each dataset, respectively. |
| Hardware Specification | Yes | All experiments were performed on four Nvidia A100 GPUs (80GB). |
| Software Dependencies | Yes | We implement our approach based on Py Torch 1.13.05 and Huggingface s Transformers6. |
| Experiment Setup | Yes | To guarantee stable and reproducible results, we utilize greedy decoding and set the temperature parameter as 0 in all experiments. ... For the small language models used for calculating self-information, we apply LLa MA-7B7 |