Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs

Authors: Kejia Zhang, Keda TAO, Jiasheng Tang, Huan Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that our method consistently reduces object hallucinations across 8 state-of-the-art LVMs, validating its efficacy across diverse evaluations. To thoroughly assess VAP, we conduct experiments from five perspectives: Consistency: Evaluating VAP s effectiveness in mitigating hallucinations across eight LVMs. Fidelity: Ensuring that visual understanding and reasoning capabilities are preserved. Compatibility: Demonstrating VAP s orthogonality to other methods and complementary benefits. Efficiency: Reducing computational cost via a lightweight solution achieving 1/8 overhead. Component Analysis: Assessing the contribution of each module through ablation.
Researcher Affiliation	Collaboration	1Xiamen University 2Westlake University 3DAMO Academy, Alibaba Group 4Hupan Lab
Pseudocode	Yes	Algorithm 1 outlines the procedure of our visual adversarial perturbation (VAP) method.
Open Source Code	Yes	Project Page: https://kejiazhang-robust.github.io/poison-cure-lvm An anonymous code link is provided in the abstract with sufficient details for reproduction.
Open Datasets	Yes	We randomly selected 500 samples from the MS-COCO dataset and generated 9,000 evaluation triplets using POPE s three sampling strategies. We randomly select 1,000 samples from the MS-COCO dataset for evaluation. AMBER and MME serve as comprehensive evaluation benchmarks for multimodal large language models.
Dataset Splits	Yes	We randomly selected 500 samples from the MS-COCO dataset and generated 9,000 evaluation triplets using POPE s three sampling strategies. Specifically, we randomly select 1,000 samples from the MS-COCO dataset for evaluation.
Hardware Specification	Yes	The experiments were performed using NVIDIA RTX 4090 (24GB), A6000 (48GB), and A100 (80GB) GPUs.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	In our experiments, we set the parameters as α = 1/255, β = 8/255, N = 10, and ϵ = 2. Due to the differences across LVMs, we assigned model-specific balancing coefficients σi (where i 1, 2, 3) and T.