Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
Authors: Tianyi Bai, Yuxuan Fan, Qiu Jiantao, Fupeng Sun, Jiayi Song, Junlin Han, Zichen Liu, Conghui He, Wentao Zhang, Binhang Yuan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on the Micro Edit Detection benchmark, which includes carefully balanced evaluation pairs designed to test sensitivity to subtle visual variations across the same edit categories. Our method improves difference detection accuracy and reduces hallucinations compared to strong baselines, including GPT-4o. Moreover, it yields consistent gains on standard vision-language tasks such as image captioning and visual question answering. |
| Researcher Affiliation | Academia | Tianyi Bai1,2 , Yuxuan Fan3 , Qiu Jiantao2 , Fupeng Sun4, Jiayi Song5, Junlin Han6, Zichen Liu1, Conghui He2 , Wentao Zhang5,2 , Binhang Yuan1 1The Hong Kong University of Science and Technology 2Shanghai Artificial Intelligence Laboratory 3The Hong Kong University of Science and Technology (Guangzhou) 4Imperial College London, 5Peking University, 6Oxford University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in narrative text and mathematical formulations (Section 4.1, 4.3) but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and datasets are publicly released at https://github.com/Relaxed-System-Lab/hallu_med. |
| Open Datasets | Yes | Code and datasets are publicly released at https://github.com/Relaxed-System-Lab/hallu_med. ... Using DOCCI [43] and Visual Genome [28], we develop a pipeline with filtering, semantic edit planning, and controlled image editing. ... We construct and release the Micro Edit Dataset (MED) and the Micro Edit Detection benchmark, targeting fine-grained vision-language reasoning. |
| Dataset Splits | Yes | To prevent contamination, benchmark samples are excluded from the fine-tuning dataset. The final benchmark contains 165 questions evenly distributed across all edit types... The MED-Real Set is created by sampling 50 minimally different image pairs from the MMVP benchmark [54]... This expands the evaluation set to 215 items, combining 165 synthetic edit pairs and 50 real-world pairs, offering a more comprehensive assessment of sensitivity to controlled differences and real-world generalization. |
| Hardware Specification | Yes | All the training processes were conducted using llamafactory [67]. Regarding image resolution and the number of image tokens, we adhere to the original settings specified by each model. Table 5: Hyperparameters for training Qwen2-VL & Qwen2.5-VL models ... GPU 8 NVIDIA A800 |
| Software Dependencies | No | All the training processes were conducted using llamafactory [67]. |
| Experiment Setup | Yes | In this section, we present all the hyperparameters we used to training the three kinds of models in Table 5, Table 6 and Table 7. All the training processes were conducted using llamafactory [67]. Regarding image resolution and the number of image tokens, we adhere to the original settings specified by each model. Table 5: Hyperparameters for training Qwen2-VL & Qwen2.5-VL models Hyperparameter Value Lo RA Rank 8 Lo RA α 16 Lo RA Dropout 0.1 Lo RA Target all GPU 8 NVIDIA A800 Batch Size 16 Gradient Accumulation Steps 8 Warmup Ratio 0.1 Learning Rate 1e-4 Learning Rate Scheduler Cosine Unfreeze Vision Tower True |