Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Authors: Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LURE on six open-source LVLMs and found it outperforms the previous best approach in both general object hallucination evaluation metrics, GPT, and human evaluations. |
| Researcher Affiliation | Academia | 1UNC-Chapel Hill, 2Rutgers University, 3Columbia University, 4Stanford University |
| Pseudocode | Yes | The training pipeline is illustrated in Alg. 1. [...] The inference pipeline is detailed in Alg. 2. |
| Open Source Code | Yes | Our data and code are available at https://github.com/Yiyang Zhou/LURE. |
| Open Datasets | Yes | MSCOCO (Lin et al., 2014) is a comprehensive dataset used for image recognition, segmentation, and captioning. |
| Dataset Splits | Yes | All hyperparameters are selected via cross-validation. |
| Hardware Specification | Yes | Here, we only need one A100 80G GPU for training, which takes approximately 10 minutes. |
| Software Dependencies | No | The paper mentions software components like GPT-3.5, Mini GPT-4, LLaMA, PyTorch, and various LVLM backbones (e.g., Vicuna, LLaMA-Adapter), but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Table 6: Training hyperparameters. Training steps 410, Warmup steps 50, Max length 512, Batch size of multi-modal instruction data 12, Optimizer Adam W, Learning rate 3e-5, Learning rate decay Cosine, Adam W ϵ 1e-6, Adam W β (0.9, 0.999), Weight decay 0.05. |