Automated Multi-level Preference for MLLMs

Authors: Mengxi Zhang, Wenhao Wu, Yu Lu, YuXin Song, KANG RONG, Huanjin Yao, Jianbo Zhao, Fanglong Liu, Haocheng Feng, Jingdong Wang, Yifan Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across public hallucination and general benchmarks, as well as our MRHal-Bench, demonstrate the effectiveness of our proposed method.
Researcher Affiliation Collaboration Mengxi Zhang1,2, Wenhao Wu3, Yu Lu4, Yuxin Song1, Kang Rong1, Huanjin Yao1,5 Jianbo Zhao6, Fanglong Liu1, Haocheng Feng1, Jingdong Wang1, Yifan Sun1 1Baidu Inc. 2Tianjin University 3The University of Sydney 4University of Technology Sydney 5Tsinghua University 6Chinese Academy of Science
Pseudocode Yes Algorithm 1 The Pseudocode of Incremental Generation for K-rank Preference Dataset.
Open Source Code Yes Code is available at https://github.com/takomc/amp.
Open Datasets Yes Our training dataset contains 1k detailed captions from Share GPT4V [16], 4k image-text pairs from [34], 4k human-annotated data from [12] and 2k multi-round dialogues annotated by us (the annotated process is detailed in Appendix A.2), forming a total of 11k training instances. ... The whole dataset contains 10k Share GPT4V [16], 20k Flickr30k [49], 30k VQAv2 [50], and 30k LRV [26].
Dataset Splits No The paper describes its training dataset and discusses testing benchmarks, but does not explicitly specify validation dataset splits or percentages for its experiments.
Hardware Specification Yes All experiments are conducted with a batch size of 16 on 8 Nvidia A100 GPUs with 40G VRAM.
Software Dependencies No The paper mentions specific models and algorithms used (e.g., LLa VA-v1.5, Adam W, Lo RA) but does not provide version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes We adpot LLa VA-v1.5 [3] as our base model for all experiments... We employ the Adam W [41] optimizer for 4 epochs and apply a peak learning rate of 5 10 5 with the cosine decay strategy. To enhance learning efficiency, we incorporate Lo RA-based [42] fine-tuning, with a low-rank r set to 64 for both attention and feed-forward modules.