Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Authors: Xinyu Lyu, Beitao Chen, Lianli Gao, Hengtao Shen, Jingkuan Song

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental research demonstrates that our HIO strategy can effectively reduce hallucinations in LVLMs, outperforming state-of-the-art methods across various benchmarks.
Researcher Affiliation Academia 1Southwestern University of Finance and Economics, Chengdu, China 2 Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China 3Center for Future Media, University of Electronic Science and Technology of China 4Tongji University 5Engineering Research Center of Intelligent Finance, Ministry of Education
Pseudocode Yes Algorithm 1 Training LVLM to Amplify Multiple Targeted Hallucination
Open Source Code Yes Code is released at https://github.com/BT-C/HIO.
Open Datasets Yes We evaluate HIO on three benchmarks including: (1) Quantitative metrics POPE Li et al. [2023b] on MSCOCO Lin et al. [2014] dataset. (2) CHAIR Rohrbach et al. [2018], Caption Hallucination Assessment with Image Relevance... (3) General-purposed Multimodal Large Language Model Evaluation (MME) Fu et al. [2023] benchmark...
Dataset Splits Yes Tab.2 and Tab.5 display results for 500 randomly selected images from the COCO val2017 and val2014 datasets, respectively.
Hardware Specification Yes The training is conducted on a robust computational setup: 4x RTX 3090 GPUs for LLa VA 1.5, 8x V100 GPUs for Mini GPT-4, and 4x A6000 GPUs for Instruct BLIP.
Software Dependencies No The paper does not specify version numbers for any ancillary software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Hyperparameters including alpha and beta are set to 1.0 and 0.1, respectively, in accordance with the VCD model s specifications.