ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses GPT-4 and obtains new state-of-the-art hallucination detection results on Halu Eval and Hallu QA by zero-shot inference.
Researcher Affiliation Collaboration Yuzhe Gu1,2 Ziwei Ji2,3 Wenwei Zhang2 Chengqi Lyu2 Dahua Lin2,4,5 Kai Chen2 1Shanghai Jiao Tong University 2Shanghai AI Laboratory 3Hong Kong University of Science and Technology 4MMLab, The Chinese University of Hong Kong 5HKGAI under Inno HK
Pseudocode No The paper describes its process through text and diagrams (e.g., Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Dataset, code, and model are released at https://github.com/open-compass/ANAH.
Open Datasets Yes Dataset, code, and model are released at https://github.com/open-compass/ANAH. We utilize ANAH dataset [29] as our seed data... The final dataset encompasses both over 3k topics, 196k model responses, and 822k annotated sentences, in English and Chinese (Tab. 1).
Dataset Splits No The paper mentions using a test set but does not provide specific details or splits for a validation dataset.
Hardware Specification Yes Our model is trained on 32 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions 'LMDeploy library [17]' but does not provide a specific version number for this or any other software dependency.
Experiment Setup Yes In M-Step, we train the annotator model with the following settings and hyper-parameters: the epoch is 1, the learning rate is 1e-5, and the Adam W optimizer is with a linear scheduler, the maximum sequence length is set to 32k.