reproducibilityindex.ai

Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

Authors: Wei Li, Hehe Fan, Yongkang Wong, Yi Yang, Mohan Kankanhalli

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both retrieval tasks (i.e., zero-shot composed image retrieval, visual storytelling image retrieval and visual dialog image retrieval) and text generation tasks (i.e., visual question answering) demonstrate the effectiveness of the proposed method.
Researcher Affiliation	Academia	Part of this work was done when Wei Li was an Intern at National University of Singapore. 1Re LER, CCAI, School of Computer Science and Technology, Zhejiang University, China. 2School of Computing, National University of Singapore, Singapore.
Pseudocode	No	The paper describes its method in prose and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at: https://github.com/dhg-wei/MCL.
Open Datasets	Yes	It costs approximately 60 A100 GPU days to generate 2.7 million tuples, using image-caption pairs from CC3M (Sharma et al., 2018) as source pairs.
Dataset Splits	Yes	We evaluate MCL on three zero-shot CIR benchmarks: CIRCO (Baldrati et al., 2023), CIRR (Liu et al., 2021a) and Gene CIS (Vaze et al., 2023). Figure 5 shows more qualitative results from the CIRCO validation set.
Hardware Specification	Yes	It costs approximately 60 A100 GPU days to generate 2.7 million tuples
Software Dependencies	No	The paper mentions models like "CLIP Vi T-L/14", "OPT-2.7B", "OPT6.7B", and "Llama2-7B" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup	Yes	MCL is trained on MMC for 50,000 iterations with a batchsize of 64. Both the LLM and CLIP model are frozen. The loss weights λCap and λRet in Equation 7 is set to 0.5 and 1.0 respectively. The temperature τ in Equation 3 and Equation 4 is set to 0.07.