Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Authors: Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to investigate the hallucination of LMMs. Our results demonstrate existing LMMs exhibit significant hallucinations when presented with our negative instructions, particularly Existent Object and Knowledge Manipulation instructions. Moreover, we successfully mitigate hallucination by finetuning Mini GPT4 and m PLUG-Owl on LRV-Instruction while improving performance on several public datasets compared to state-of-the-art methods.
Researcher Affiliation Collaboration Fuxiao Liu1, Kevin Lin2, Linjie Li2, Jianfeng Wang2, Yaser Yacoob1, Lijuan Wang2 1University of Maryland, College Park 2Microsoft Corporation EMAIL, EMAIL
Pseudocode No The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code and data are available at https://github.com/Fuxiao Liu/LRV-Instruction.
Open Datasets Yes Our dataset comprises 400k visual instructions generated by GPT4... Code and data are available at https://github.com/Fuxiao Liu/LRV-Instruction.
Dataset Splits No The paper mentions training on LRV-Instruction (approx. 399k instances) and using a separate 'evaluation set' of 1000 instances, but does not explicitly provide training/validation/test splits for their own dataset or other datasets with percentages or counts for validation.
Hardware Specification Yes We trained our models on NVIDIA Quadro RTX 8000.
Software Dependencies No The paper mentions using models and techniques like Vicuna, LLaMA, and LoRA, but does not specify version numbers for these software components or other ancillary software dependencies.
Experiment Setup No The paper states, 'As for the hyper-parameters, please refer to (Zhu et al., 2023; Ye et al., 2023),' deferring the experimental setup details to external references instead of providing them explicitly in the main text.