Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Authors: Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LURE on six open-source LVLMs and found it outperforms the previous best approach in both general object hallucination evaluation metrics, GPT, and human evaluations.
Researcher Affiliation Academia 1UNC-Chapel Hill, 2Rutgers University, 3Columbia University, 4Stanford University
Pseudocode Yes The training pipeline is illustrated in Alg. 1. [...] The inference pipeline is detailed in Alg. 2.
Open Source Code Yes Our data and code are available at https://github.com/Yiyang Zhou/LURE.
Open Datasets Yes MSCOCO (Lin et al., 2014) is a comprehensive dataset used for image recognition, segmentation, and captioning.
Dataset Splits Yes All hyperparameters are selected via cross-validation.
Hardware Specification Yes Here, we only need one A100 80G GPU for training, which takes approximately 10 minutes.
Software Dependencies No The paper mentions software components like GPT-3.5, Mini GPT-4, LLaMA, PyTorch, and various LVLM backbones (e.g., Vicuna, LLaMA-Adapter), but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes Table 6: Training hyperparameters. Training steps 410, Warmup steps 50, Max length 512, Batch size of multi-modal instruction data 12, Optimizer Adam W, Learning rate 3e-5, Learning rate decay Cosine, Adam W ϵ 1e-6, Adam W β (0.9, 0.999), Weight decay 0.05.