ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses GPT-4 and obtains new state-of-the-art hallucination detection results on Halu Eval and Hallu QA by zero-shot inference. |
| Researcher Affiliation | Collaboration | Yuzhe Gu1,2 Ziwei Ji2,3 Wenwei Zhang2 Chengqi Lyu2 Dahua Lin2,4,5 Kai Chen2 1Shanghai Jiao Tong University 2Shanghai AI Laboratory 3Hong Kong University of Science and Technology 4MMLab, The Chinese University of Hong Kong 5HKGAI under Inno HK |
| Pseudocode | No | The paper describes its process through text and diagrams (e.g., Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Dataset, code, and model are released at https://github.com/open-compass/ANAH. |
| Open Datasets | Yes | Dataset, code, and model are released at https://github.com/open-compass/ANAH. We utilize ANAH dataset [29] as our seed data... The final dataset encompasses both over 3k topics, 196k model responses, and 822k annotated sentences, in English and Chinese (Tab. 1). |
| Dataset Splits | No | The paper mentions using a test set but does not provide specific details or splits for a validation dataset. |
| Hardware Specification | Yes | Our model is trained on 32 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions 'LMDeploy library [17]' but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | Yes | In M-Step, we train the annotator model with the following settings and hyper-parameters: the epoch is 1, the learning rate is 1e-5, and the Adam W optimizer is with a linear scheduler, the maximum sequence length is set to 32k. |