reproducibilityindex.ai

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

Authors: Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, GUANNAN JIANG, Xiaoshuai Sun, Rongrong Ji

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results demonstrate that our method exhibits out-of-domain generalization and interpretability.
Researcher Affiliation	Collaboration	Mingrui Wu1, Xinyue Cai1, Jiayi Ji1 , Jiale Li1, Oucheng Huang1, Gen Luo1, Hao Fei2, Guannan Jiang3, Xiaoshuai Sun1, Rongrong Ji1 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China 2 National University of Singapore 3 CATL
Pseudocode	No	The paper provides mathematical formulations and descriptions of the approach, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code: https://github. com/mrwu-mac/Control MLLM.
Open Datasets	Yes	We follow the setting of Ferret to form 1,748 questions (in which 1,548 for test and 200 for validation) based on LVIS [25] validation dataset, with corresponding box, mask, scribble and point.
Dataset Splits	Yes	We follow the setting of Ferret to form 1,748 questions (in which 1,548 for test and 200 for validation) based on LVIS [25] validation dataset, with corresponding box, mask, scribble and point.
Hardware Specification	Yes	All experiments are conducted on two RTX 3090 GPUs with 24 GB of memory each.
Software Dependencies	No	The paper mentions using 'LLa VA-v1.5-7B [35]' as the MLLM, but it does not specify software dependencies like Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	Unless explicitly stated otherwise, the MLLM we use is LLa VA-v1.5-7B [35], T=5, α=400 and β = 0.5.