reproducibilityindex.ai

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses GPT-4 and obtains new state-of-the-art hallucination detection results on Halu Eval and Hallu QA by zero-shot inference.
Researcher Affiliation	Collaboration	Yuzhe Gu1,2 Ziwei Ji2,3 Wenwei Zhang2 Chengqi Lyu2 Dahua Lin2,4,5 Kai Chen2 1Shanghai Jiao Tong University 2Shanghai AI Laboratory 3Hong Kong University of Science and Technology 4MMLab, The Chinese University of Hong Kong 5HKGAI under Inno HK
Pseudocode	No	The paper describes its process through text and diagrams (e.g., Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Dataset, code, and model are released at https://github.com/open-compass/ANAH.
Open Datasets	Yes	Dataset, code, and model are released at https://github.com/open-compass/ANAH. We utilize ANAH dataset [29] as our seed data... The final dataset encompasses both over 3k topics, 196k model responses, and 822k annotated sentences, in English and Chinese (Tab. 1).
Dataset Splits	No	The paper mentions using a test set but does not provide specific details or splits for a validation dataset.
Hardware Specification	Yes	Our model is trained on 32 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions 'LMDeploy library [17]' but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	In M-Step, we train the annotator model with the following settings and hyper-parameters: the epoch is 1, the learning rate is 1e-5, and the Adam W optimizer is with a linear scheduler, the maximum sequence length is set to 32k.