ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Authors: Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, GUANNAN JIANG, Xiaoshuai Sun, Rongrong Ji
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results demonstrate that our method exhibits out-of-domain generalization and interpretability. |
| Researcher Affiliation | Collaboration | Mingrui Wu1, Xinyue Cai1, Jiayi Ji1 , Jiale Li1, Oucheng Huang1, Gen Luo1, Hao Fei2, Guannan Jiang3, Xiaoshuai Sun1, Rongrong Ji1 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China 2 National University of Singapore 3 CATL |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of the approach, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code: https://github. com/mrwu-mac/Control MLLM. |
| Open Datasets | Yes | We follow the setting of Ferret to form 1,748 questions (in which 1,548 for test and 200 for validation) based on LVIS [25] validation dataset, with corresponding box, mask, scribble and point. |
| Dataset Splits | Yes | We follow the setting of Ferret to form 1,748 questions (in which 1,548 for test and 200 for validation) based on LVIS [25] validation dataset, with corresponding box, mask, scribble and point. |
| Hardware Specification | Yes | All experiments are conducted on two RTX 3090 GPUs with 24 GB of memory each. |
| Software Dependencies | No | The paper mentions using 'LLa VA-v1.5-7B [35]' as the MLLM, but it does not specify software dependencies like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | Unless explicitly stated otherwise, the MLLM we use is LLa VA-v1.5-7B [35], T=5, α=400 and β = 0.5. |