Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Authors: Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LURE on six open-source LVLMs and found it outperforms the previous best approach in both general object hallucination evaluation metrics, GPT, and human evaluations. |
| Researcher Affiliation | Academia | 1UNC-Chapel Hill, 2Rutgers University, 3Columbia University, 4Stanford University |
| Pseudocode | Yes | The training pipeline is illustrated in Alg. 1. [...] The inference pipeline is detailed in Alg. 2. |
| Open Source Code | Yes | Our data and code are available at https://github.com/Yiyang Zhou/LURE. |
| Open Datasets | Yes | MSCOCO (Lin et al., 2014) is a comprehensive dataset used for image recognition, segmentation, and captioning. |
| Dataset Splits | Yes | All hyperparameters are selected via cross-validation. |
| Hardware Specification | Yes | Here, we only need one A100 80G GPU for training, which takes approximately 10 minutes. |
| Software Dependencies | No | The paper mentions software components like GPT-3.5, Mini GPT-4, LLaMA, PyTorch, and various LVLM backbones (e.g., Vicuna, LLaMA-Adapter), but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Table 6: Training hyperparameters. Training steps 410, Warmup steps 50, Max length 512, Batch size of multi-modal instruction data 12, Optimizer Adam W, Learning rate 3e-5, Learning rate decay Cosine, Adam W ϵ 1e-6, Adam W β (0.9, 0.999), Weight decay 0.05. |