reproducibilityindex.ai

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Authors: Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate LURE on six open-source LVLMs and found it outperforms the previous best approach in both general object hallucination evaluation metrics, GPT, and human evaluations.
Researcher Affiliation	Academia	1UNC-Chapel Hill, 2Rutgers University, 3Columbia University, 4Stanford University
Pseudocode	Yes	The training pipeline is illustrated in Alg. 1. [...] The inference pipeline is detailed in Alg. 2.
Open Source Code	Yes	Our data and code are available at https://github.com/Yiyang Zhou/LURE.
Open Datasets	Yes	MSCOCO (Lin et al., 2014) is a comprehensive dataset used for image recognition, segmentation, and captioning.
Dataset Splits	Yes	All hyperparameters are selected via cross-validation.
Hardware Specification	Yes	Here, we only need one A100 80G GPU for training, which takes approximately 10 minutes.
Software Dependencies	No	The paper mentions software components like GPT-3.5, Mini GPT-4, LLaMA, PyTorch, and various LVLM backbones (e.g., Vicuna, LLaMA-Adapter), but it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	Table 6: Training hyperparameters. Training steps 410, Warmup steps 50, Max length 512, Batch size of multi-modal instruction data 12, Optimizer Adam W, Learning rate 3e-5, Learning rate decay Cosine, Adam W ϵ 1e-6, Adam W β (0.9, 0.999), Weight decay 0.05.