reproducibilityindex.ai

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

Authors: Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that these ICD sequences can improve the ICL performance of two LVLMs compared with some strong baselines in Visual Question Answering and Image Captioning, validating that Lever-LM can really capture the statistical patterns for levering LVLMs.
Researcher Affiliation	Academia	Xu Yang1,2 , Yingzhe Peng1,2, Haoxuan Ma1,2, Shuo Xu1,2, Chi Zhang3, Yucheng Han4, Hanwang Zhang4 1 Southeast University 2 Key Laboratory of New Generation Artificial Intelligence Technology & Its Interdisciplinary Applications, (Southeast University),Ministry of Education 3 Westlake University 4 Nanyang Technological University
Pseudocode	No	The paper describes the architecture and process in text and diagrams (Figure 2), but does not provide structured pseudocode or an algorithm block.
Open Source Code	Yes	The code is available at https://github.com/For Jade Forest/Lever-LM.
Open Datasets	Yes	Our approach is evaluated on MS-COCO [56] for Image Captioning (IC) and VQAV2 [60] for Visual Question Answering (VQA). For each corresponding dataset, we use the train split to construct the DM and use the validation split to evaluate the performance of ICD configurations generated by Lever-LM. More details are given in Appendix A.
Dataset Splits	Yes	Our approach is evaluated on MS-COCO [56] for Image Captioning (IC) and VQAV2 [60] for Visual Question Answering (VQA). For each corresponding dataset, we use the train split to construct the DM and use the validation split to evaluate the performance of ICD configurations generated by Lever-LM. More details are given in Appendix A.
Hardware Specification	Yes	All experiments are deployed on an RTX 3090. All training processes are carried out with mixed precision and 2 RTX3090 GPUs.
Software Dependencies	No	The paper mentions software like Adam W optimizer, Open Flamingo, IDEFICS, and CLIP, but does not specify their version numbers (e.g., PyTorch version, specific library versions).
Experiment Setup	Yes	The training phase leverages the Adam W optimizer [61] and a cosine learning rate scheduler. We set the learning rate to 1 10 4 and the batch size to 128. We train our Lever-LM for 20 epochs. To implement ICL, we use Open Flamingo V2-9B [34] and IDEFICS-9B [14] as our LVLMs.