Automated Multi-level Preference for MLLMs
Authors: Mengxi Zhang, Wenhao Wu, Yu Lu, YuXin Song, KANG RONG, Huanjin Yao, Jianbo Zhao, Fanglong Liu, Haocheng Feng, Jingdong Wang, Yifan Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across public hallucination and general benchmarks, as well as our MRHal-Bench, demonstrate the effectiveness of our proposed method. |
| Researcher Affiliation | Collaboration | Mengxi Zhang1,2, Wenhao Wu3, Yu Lu4, Yuxin Song1, Kang Rong1, Huanjin Yao1,5 Jianbo Zhao6, Fanglong Liu1, Haocheng Feng1, Jingdong Wang1, Yifan Sun1 1Baidu Inc. 2Tianjin University 3The University of Sydney 4University of Technology Sydney 5Tsinghua University 6Chinese Academy of Science |
| Pseudocode | Yes | Algorithm 1 The Pseudocode of Incremental Generation for K-rank Preference Dataset. |
| Open Source Code | Yes | Code is available at https://github.com/takomc/amp. |
| Open Datasets | Yes | Our training dataset contains 1k detailed captions from Share GPT4V [16], 4k image-text pairs from [34], 4k human-annotated data from [12] and 2k multi-round dialogues annotated by us (the annotated process is detailed in Appendix A.2), forming a total of 11k training instances. ... The whole dataset contains 10k Share GPT4V [16], 20k Flickr30k [49], 30k VQAv2 [50], and 30k LRV [26]. |
| Dataset Splits | No | The paper describes its training dataset and discusses testing benchmarks, but does not explicitly specify validation dataset splits or percentages for its experiments. |
| Hardware Specification | Yes | All experiments are conducted with a batch size of 16 on 8 Nvidia A100 GPUs with 40G VRAM. |
| Software Dependencies | No | The paper mentions specific models and algorithms used (e.g., LLa VA-v1.5, Adam W, Lo RA) but does not provide version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | We adpot LLa VA-v1.5 [3] as our base model for all experiments... We employ the Adam W [41] optimizer for 4 epochs and apply a peak learning rate of 5 10 5 with the cosine decay strategy. To enhance learning efficiency, we incorporate Lo RA-based [42] fine-tuning, with a low-rank r set to 64 for both attention and feed-forward modules. |