Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Authors: Kejia Zhang, Keda TAO, Jiasheng Tang, Huan Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method consistently reduces object hallucinations across 8 state-of-the-art LVMs, validating its efficacy across diverse evaluations. To thoroughly assess VAP, we conduct experiments from five perspectives: Consistency: Evaluating VAP s effectiveness in mitigating hallucinations across eight LVMs. Fidelity: Ensuring that visual understanding and reasoning capabilities are preserved. Compatibility: Demonstrating VAP s orthogonality to other methods and complementary benefits. Efficiency: Reducing computational cost via a lightweight solution achieving 1/8 overhead. Component Analysis: Assessing the contribution of each module through ablation. |
| Researcher Affiliation | Collaboration | 1Xiamen University 2Westlake University 3DAMO Academy, Alibaba Group 4Hupan Lab |
| Pseudocode | Yes | Algorithm 1 outlines the procedure of our visual adversarial perturbation (VAP) method. |
| Open Source Code | Yes | Project Page: https://kejiazhang-robust.github.io/poison-cure-lvm An anonymous code link is provided in the abstract with sufficient details for reproduction. |
| Open Datasets | Yes | We randomly selected 500 samples from the MS-COCO dataset and generated 9,000 evaluation triplets using POPE s three sampling strategies. We randomly select 1,000 samples from the MS-COCO dataset for evaluation. AMBER and MME serve as comprehensive evaluation benchmarks for multimodal large language models. |
| Dataset Splits | Yes | We randomly selected 500 samples from the MS-COCO dataset and generated 9,000 evaluation triplets using POPE s three sampling strategies. Specifically, we randomly select 1,000 samples from the MS-COCO dataset for evaluation. |
| Hardware Specification | Yes | The experiments were performed using NVIDIA RTX 4090 (24GB), A6000 (48GB), and A100 (80GB) GPUs. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In our experiments, we set the parameters as α = 1/255, β = 8/255, N = 10, and ϵ = 2. Due to the differences across LVMs, we assigned model-specific balancing coefficients σi (where i 1, 2, 3) and T. |