reproducibilityindex.ai

Automated Multi-level Preference for MLLMs

Authors: Mengxi Zhang, Wenhao Wu, Yu Lu, YuXin Song, KANG RONG, Huanjin Yao, Jianbo Zhao, Fanglong Liu, Haocheng Feng, Jingdong Wang, Yifan Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across public hallucination and general benchmarks, as well as our MRHal-Bench, demonstrate the effectiveness of our proposed method.
Researcher Affiliation	Collaboration	Mengxi Zhang1,2, Wenhao Wu3, Yu Lu4, Yuxin Song1, Kang Rong1, Huanjin Yao1,5 Jianbo Zhao6, Fanglong Liu1, Haocheng Feng1, Jingdong Wang1, Yifan Sun1 1Baidu Inc. 2Tianjin University 3The University of Sydney 4University of Technology Sydney 5Tsinghua University 6Chinese Academy of Science
Pseudocode	Yes	Algorithm 1 The Pseudocode of Incremental Generation for K-rank Preference Dataset.
Open Source Code	Yes	Code is available at https://github.com/takomc/amp.
Open Datasets	Yes	Our training dataset contains 1k detailed captions from Share GPT4V [16], 4k image-text pairs from [34], 4k human-annotated data from [12] and 2k multi-round dialogues annotated by us (the annotated process is detailed in Appendix A.2), forming a total of 11k training instances. ... The whole dataset contains 10k Share GPT4V [16], 20k Flickr30k [49], 30k VQAv2 [50], and 30k LRV [26].
Dataset Splits	No	The paper describes its training dataset and discusses testing benchmarks, but does not explicitly specify validation dataset splits or percentages for its experiments.
Hardware Specification	Yes	All experiments are conducted with a batch size of 16 on 8 Nvidia A100 GPUs with 40G VRAM.
Software Dependencies	No	The paper mentions specific models and algorithms used (e.g., LLa VA-v1.5, Adam W, Lo RA) but does not provide version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	We adpot LLa VA-v1.5 [3] as our base model for all experiments... We employ the Adam W [41] optimizer for 4 epochs and apply a peak learning rate of 5 10 5 with the cosine decay strategy. To enhance learning efficiency, we incorporate Lo RA-based [42] fine-tuning, with a low-rank r set to 64 for both attention and feed-forward modules.